Internet Engineering Task Force (IETF)                      E. Ivov, Ed.
Request for Comments: 6465                                         Jitsi
Category: Standards Track                                E. Marocco, Ed.
ISSN: 2070-1721                                           Telecom Italia
                                                               J. Lennox
                                                                   Vidyo
                                                           December 2011
        
Internet Engineering Task Force (IETF)                      E. Ivov, Ed.
Request for Comments: 6465                                         Jitsi
Category: Standards Track                                E. Marocco, Ed.
ISSN: 2070-1721                                           Telecom Italia
                                                               J. Lennox
                                                                   Vidyo
                                                           December 2011
        

A Real-time Transport Protocol (RTP) Header Extension for Mixer-to-Client Audio Level Indication

用于混音器到客户端音频电平指示的实时传输协议(RTP)报头扩展

Abstract

摘要

This document describes a mechanism for RTP-level mixers in audio conferences to deliver information about the audio level of individual participants. Such audio level indicators are transported in the same RTP packets as the audio data they pertain to.

本文档描述了音频会议中RTP级别混音器的一种机制,用于提供有关单个参与者音频级别的信息。此类音频电平指示符在与其相关的音频数据相同的RTP分组中传输。

Status of This Memo

关于下段备忘

This is an Internet Standards Track document.

这是一份互联网标准跟踪文件。

This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741.

本文件是互联网工程任务组(IETF)的产品。它代表了IETF社区的共识。它已经接受了公众审查,并已被互联网工程指导小组(IESG)批准出版。有关互联网标准的更多信息,请参见RFC 5741第2节。

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6465.

有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc6465.

Copyright Notice

版权公告

Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.

版权所有(c)2011 IETF信托基金和确定为文件作者的人员。版权所有。

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。

Table of Contents

目录

   1. Introduction ....................................................2
   2. Terminology .....................................................4
   3. Protocol Operation ..............................................4
   4. Audio Levels ....................................................5
   5. Signaling Information ...........................................7
   6. Security Considerations .........................................9
   7. IANA Considerations ............................................10
   8. Acknowledgments ................................................10
   9. References .....................................................10
      9.1. Normative References ......................................10
      9.2. Informative References ....................................11
   Appendix A. Reference Implementation ..............................12
      A.1. AudioLevelCalculator.java .................................12
        
   1. Introduction ....................................................2
   2. Terminology .....................................................4
   3. Protocol Operation ..............................................4
   4. Audio Levels ....................................................5
   5. Signaling Information ...........................................7
   6. Security Considerations .........................................9
   7. IANA Considerations ............................................10
   8. Acknowledgments ................................................10
   9. References .....................................................10
      9.1. Normative References ......................................10
      9.2. Informative References ....................................11
   Appendix A. Reference Implementation ..............................12
      A.1. AudioLevelCalculator.java .................................12
        
1. Introduction
1. 介绍

"A Framework for Conferencing with the Session Initiation Protocol (SIP)" [RFC4353] presents an overall architecture for multi-party conferencing. Among others, the framework borrows from RTP [RFC3550] and extends the concept of a mixer entity "responsible for combining the media streams that make up a conference, and generating one or more output streams that are delivered to recipients". Every participant would hence receive, in a flat single stream, media originating from all the others.

“会话启动协议(SIP)会议框架”[RFC4353]介绍了多方会议的总体架构。除其他外,该框架借鉴了RTP[RFC3550]并扩展了混合器实体的概念,“负责组合组成会议的媒体流,并生成一个或多个发送给接收者的输出流”。因此,每个参与者都将在一条平坦的单一流中接收来自所有其他参与者的媒体。

Using such centralized mixer-based architectures simplifies support for conference calls on the client side, since they would hardly differ from one-to-one conversations. However, the method also introduces a few limitations. The flat nature of the streams that a mixer would output and send to participants makes it difficult for users to identify the original source of what they are hearing.

使用这种基于集中式混音器的体系结构简化了对客户端会议呼叫的支持,因为它们与一对一对话几乎没有区别。然而,该方法也引入了一些限制。混音器输出并发送给参与者的流是平坦的,这使得用户很难识别他们听到的声音的原始来源。

Mechanisms that allow the mixer to send to participants cues on current speakers (e.g., the contributing source (CSRC) fields in RTP [RFC3550]) only work for speaking/silent binary indications. There are, however, a number of use cases where one would require more detailed information. Possible examples include the presence of background chat/noise/music/typing, someone breathing noisily in their microphone, or other cases where identifying the source of the disturbance would make it easy to remove it (e.g., by sending a private IM to the concerned party asking them to mute their microphone). A more advanced scenario could involve an intense discussion between multiple participants that the user does not personally know. Audio level information would help better recognize the speakers by associating with them complex (but still human readable) characteristics like loudness and speed, for example.

允许混音器向参与者发送当前说话人提示的机制(例如,RTP[RFC3550]中的贡献源(CSC)字段)仅适用于说话/无声二进制指示。然而,有许多用例需要更详细的信息。可能的例子包括存在背景聊天/噪音/音乐/打字、有人在麦克风中有声音地呼吸,或者识别干扰源可以很容易地消除干扰的其他情况(例如,向相关方发送私人IM,要求他们将麦克风静音)。更高级的场景可能涉及多个参与者之间的激烈讨论,而用户个人并不知道。例如,通过将复杂(但仍然是人类可读的)特征(如响度和速度)与扬声器关联,音频级别信息将有助于更好地识别扬声器。

One way of presenting such information in a user-friendly manner would be for a conferencing client to attach audio level indicators to the corresponding participant-related components in the user interface. One possible example is displayed in Figure 1, where levels can help users determine that Alice is currently the active speaker, Carol is mute, and Bob and Dave are sending some background noise.

以用户友好的方式呈现此类信息的一种方法是,会议客户端将音频级别指示器附加到用户界面中与参与者相关的相应组件上。图1显示了一个可能的示例,其中级别可以帮助用户确定Alice当前是活动扬声器,Carol是静音的,Bob和Dave发送一些背景噪音。

                         ________________________
                        |                        |
                        |  00:42 |  Weekly Call  |
                        |________________________|
                        |                        |
                        |                        |
                        | Alice |======    | (S) |
                        |                        |
                        | Bob   |=         |     |
                        |                        |
                        | Carol |          | (M) |
                        |                        |
                        | Dave  |===       |     |
                        |                        |
                        |________________________|
        
                         ________________________
                        |                        |
                        |  00:42 |  Weekly Call  |
                        |________________________|
                        |                        |
                        |                        |
                        | Alice |======    | (S) |
                        |                        |
                        | Bob   |=         |     |
                        |                        |
                        | Carol |          | (M) |
                        |                        |
                        | Dave  |===       |     |
                        |                        |
                        |________________________|
        

Figure 1: Displaying Detailed Speaker Information to the User by Including Audio Level for Every Participant

图1:通过包括每个参与者的音频级别,向用户显示详细的演讲者信息

Implementing a user interface like the above requires analysis of the media sent from other participants. In a conventional audio conference, this is only possible for the mixer, since all other conference participants are generally receiving a single, flat audio stream and therefore have no immediate way of determining individual audio levels.

实现上述用户界面需要对其他参与者发送的媒体进行分析。在传统的音频会议中,这仅适用于混音器,因为所有其他会议参与者通常接收单个平坦音频流,因此无法立即确定各个音频级别。

This document specifies an RTP extension header that allows such mixers to deliver audio level information to conference participants by including it directly in the RTP packets transporting the corresponding audio data.

本文档指定了一个RTP扩展头,该头允许此类混音器通过将音频级别信息直接包含在传输相应音频数据的RTP数据包中,向会议参与者传送音频级别信息。

The header extension in this document is different than, but complementary to, the one defined in [RFC6464], which defines a mechanism by which clients can indicate to audio mixers the levels of the audio in the packets they send.

本文档中的头扩展不同于[RFC6464]中定义的头扩展,但是对其的补充,后者定义了一种机制,通过该机制,客户端可以向音频混音器指示其发送的数据包中的音频级别。

2. Terminology
2. 术语

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照RFC 2119[RFC2119]中所述进行解释。

3. Protocol Operation
3. 协议操作

According to RFC 3550 [RFC3550], a mixer is expected to include in outgoing RTP packets a list of identifiers (CSRC IDs) indicating the sources that contributed to the resulting stream. The presence of such CSRC IDs allows RTP clients to determine, in a binary way, the active speaker(s) in any given moment. The RTP Control Protocol (RTCP) also provides a basic mechanism to map the CSRC IDs to user identities through the CNAME field. More advanced mechanisms can exist, depending on the signaling protocol used to establish and control a conference. In the case of the Session Initiation Protocol [RFC3261], for example, "A Session Initiation Protocol (SIP) Event Package for Conference State" [RFC4575] defines a <src-id> tag that binds CSRC IDs to media streams and SIP URIs.

根据RFC 3550[RFC3550],期望混合器在输出RTP分组中包括指示对结果流作出贡献的源的标识符列表(csc id)。此类CSRC ID的存在允许RTP客户端以二进制方式确定任何给定时刻的活动说话人。RTP控制协议(RTCP)还提供了通过CNAME字段将CSRC ID映射到用户身份的基本机制。根据用于建立和控制会议的信令协议,可以存在更高级的机制。在会话发起协议[RFC3261]的情况下,例如,“用于会议状态的会话发起协议(SIP)事件包”[RFC4575]定义将CSRC id绑定到媒体流和SIP URI的<src id>标记。

This document describes an RTP header extension that allows mixers to indicate the audio level of every contributing conference participant (CSRC) in addition to simply indicating their on/off status. This new header extension uses the general mechanism for RTP header extensions as described in [RFC5285].

本文档描述了一个RTP头扩展,该扩展允许混音器除了简单地指示其开/关状态外,还指示每个贡献会议参与者(CSC)的音频级别。这个新的报头扩展使用了RTP报头扩展的通用机制,如[RFC5285]中所述。

Each instance of this header contains a list of one-octet audio levels expressed in -dBov, with values from 0 to 127 representing 0 to -127 dBov (see Figures 2 and 3). Appendix A provides a reference implementation indicating one way of obtaining such values from raw audio samples.

此标头的每个实例都包含一个以-dBov表示的八位字节音频级别列表,0到127之间的值表示0到-127dbov(见图2和图3)。附录A提供了一个参考实现,说明了从原始音频样本中获取此类值的一种方法。

Every audio level value pertains to the CSRC identifier located at the corresponding position in the CSRC list. In other words, the first value would indicate the audio level of the conference participant represented by the first CSRC identifier in that packet, and so forth. The number and order of these values MUST therefore match the number and order of the CSRC IDs present in the same packet.

每个音频级别值都属于位于CSC列表中相应位置的CSC标识符。换句话说,第一个值将指示由该分组中的第一csc标识符表示的会议参与者的音频级别,依此类推。因此,这些值的数量和顺序必须与同一数据包中存在的CSRC ID的数量和顺序相匹配。

When encoding audio level information, a mixer SHOULD include in a packet information that corresponds to the audio data being transported in that same packet. It is important that these values follow the actual stream as closely as possible. Therefore, a mixer SHOULD also calculate the values after the original contributing stream has undergone possible processing such as level normalization, and noise reduction, for example.

当对音频级别信息进行编码时,混音器应在数据包中包含与在同一数据包中传输的音频数据相对应的信息。重要的是,这些值应尽可能接近实际流。因此,混频器还应在原始贡献流已经经历例如电平归一化和噪声降低等可能的处理之后计算值。

It can sometimes happen that a conference involves more than a single mixer. In such cases, each of the mixers MAY choose to relay the CSRC list and audio level information they receive from peer mixers (as long as the total CSRC count remains below 16). Given that the maximum audio level is not precisely defined by this specification, it is likely that in such situations average audio levels would be perceptibly different for the participants located behind the different mixers.

有时,一次会议可能涉及多个混音器。在这种情况下,每个混音器可以选择转发他们从对等混音器接收到的CSC列表和音频级别信息(只要CSC总数保持在16以下)。鉴于本规范未精确定义最大音频电平,在这种情况下,位于不同混音器后面的参与者的平均音频电平可能会明显不同。

4. Audio Levels
4. 声级

The audio level header extension carries the level of the audio in the RTP payload of the packet with which it is associated. This information is carried in an RTP header extension element as defined by "A General Mechanism for RTP Header Extensions" [RFC5285].

音频电平报头扩展携带与之相关联的分组的RTP有效载荷中的音频电平。该信息在“RTP报头扩展的通用机制”[RFC5285]定义的RTP报头扩展元素中携带。

The payload of the audio level header extension element can be encoded using either the one-byte or two-byte header defined in [RFC5285]. Figures 2 and 3 show sample audio level encodings with each of these header formats.

音频电平报头扩展元素的有效负载可以使用[RFC5285]中定义的单字节或双字节报头进行编码。图2和图3显示了每种头格式的音频级编码示例。

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  ID   | len=2 |0|   level 1   |0|   level 2   |0|   level 3   |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  ID   | len=2 |0|   level 1   |0|   level 2   |0|   level 3   |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Figure 2: Sample Audio Level Encoding Using the One-Byte Header Format

图2:使用单字节头格式的音频级编码示例

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |      ID       |     len=3     |0|   level 1   |0|   level 2   |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |0|   level 3   |    0 (pad)    |               ...
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |      ID       |     len=3     |0|   level 1   |0|   level 2   |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |0|   level 3   |    0 (pad)    |               ...
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Figure 3: Sample Audio Level Encoding Using the Two-Byte Header Format

图3:使用双字节头格式的音频级编码示例

In the case of the one-byte header format, the 4-bit len field is the number minus one of data bytes (i.e., audio level values) transported in this header extension element following the one-byte header. Therefore, the value zero in this field indicates that one byte of data follows. In the case of the two-byte header format, the 8-bit len field contains the exact number of audio levels carried in the

在单字节报头格式的情况下,4位len字段是在单字节报头之后的该报头扩展元素中传输的数据字节(即音频电平值)的数字减去1。因此,该字段中的值0表示后面有一个字节的数据。在双字节头格式的情况下,8位len字段包含文件中携带的音频级别的确切数量

extension. RFC 3550 [RFC3550] only allows RTP packets to carry a maximum of 15 CSRC IDs. Given that audio levels directly refer to CSRC IDs, implementations MUST NOT include more than 15 audio level values. The maximum value allowed in the len field is therefore 14 for the one-byte header format and 15 for the two-byte header format.

扩大RFC 3550[RFC3550]仅允许RTP数据包最多携带15个CSRC ID。鉴于音频级别直接指的是CSRC ID,实现中不得包含超过15个音频级别值。因此,len字段中允许的最大值对于单字节头格式为14,对于双字节头格式为15。

Note: Audio levels in this document are defined in the same manner as is audio noise level in the RTP Payload Comfort Noise specification [RFC3389]. In [RFC3389], the overall magnitude of the noise level in comfort noise is encoded into the first byte of the payload, with spectral information about the noise in subsequent bytes. This specification's audio level parameter is defined so as to be identical to the comfort noise payload's noise-level byte.

注:本文件中的音频电平定义方式与RTP有效负载舒适性噪声规范[RFC3389]中的音频噪声电平定义方式相同。在[RFC3389]中,舒适性噪声中噪声级的总体大小被编码到有效载荷的第一个字节中,关于噪声的频谱信息在随后的字节中。本规范的音频电平参数定义为与舒适性噪声有效负载的噪声电平字节相同。

The magnitude of the audio level itself is packed into the seven least significant bits of the single byte of the header extension, shown in Figures 2 and 3. The least significant bit of the audio level magnitude is packed into the least significant bit of the byte. The most significant bit of the byte is unused and always set to 0.

音频电平本身的大小被压缩到报头扩展单字节的七个最低有效位中,如图2和图3所示。音频电平幅度的最低有效位被压缩到字节的最低有效位。字节的最高有效位未使用,并始终设置为0。

The audio level is expressed in -dBov, with values from 0 to 127 representing 0 to -127 dBov. dBov is the level, in decibels, relative to the overload point of the system, i.e., the highest-intensity signal encodable by the payload format. (Note: Representation relative to the overload point of a system is particularly useful for digital implementations, since one does not need to know the relative calibration of the analog circuitry.) For example, in the case of u-law (audio/pcmu) audio [ITU.G711], the 0 dBov reference would be a square wave with values +/- 8031. (This translates to 6.18 dBm0, relative to u-law's dBm0 definition in Table 6 of [ITU.G711].)

音频电平用-dBov表示,0到127之间的值表示0到-127 dBov。dBov是相对于系统过载点的电平(分贝),即有效载荷格式可编码的最高强度信号。(注:相对于系统过载点的表示对于数字实现特别有用,因为人们不需要知道模拟电路的相对校准。)例如,在u律(音频/pcmu)音频[ITU.G711]的情况下,0 dBov参考将是值为+/-8031的方波。(相对于[ITU.G711]表6中u-law的dBm0定义,这转化为6.18 dBm0。)

The audio level for digital silence -- for a muted audio source, for example -- MUST be represented as 127 (-127 dBov), regardless of the dynamic range of the encoded audio format.

无论编码音频格式的动态范围如何,数字静音的音频电平(例如,对于静音音频源)必须表示为127(-127 dBov)。

The audio level header extension only carries the level of the audio in the RTP payload of the packet with which it is associated, with no long-term averaging or smoothing applied. That level is measured as a root mean square of all the samples in the measured range.

音频电平报头扩展仅携带与其相关联的分组的RTP有效载荷中的音频电平,而不应用长期平均或平滑。该水平作为测量范围内所有样本的均方根进行测量。

To simplify implementation of the encoding procedures described here, this specification provides a sample Java implementation (see Appendix A) of an audio level calculator that helps obtain such values from raw linear Pulse Code Modulation (PCM) audio samples.

为了简化此处描述的编码过程的实现,本规范提供了音频电平计算器的示例Java实现(见附录a),该计算器有助于从原始线性脉冲编码调制(PCM)音频样本中获取此类值。

5. Signaling Information
5. 信号信息

The URI for declaring the audio level header extension in a Session Description Protocol (SDP) extmap attribute and mapping it to a local extension header identifier is "urn:ietf:params:rtp-hdrext:csrc-audio-level". There is no additional setup information needed for this extension (i.e., no extension attributes).

用于在会话描述协议(SDP)extmap属性中声明音频级别标头扩展并将其映射到本地扩展标头标识符的URI为“urn:ietf:params:rtp hdrext:csc音频级别”。此扩展不需要其他设置信息(即,没有扩展属性)。

An example attribute line in the SDP for a conference might be:

会议SDP中的一个示例属性行可能是:

      a=extmap:7 urn:ietf:params:rtp-hdrext:csrc-audio-level
        
      a=extmap:7 urn:ietf:params:rtp-hdrext:csrc-audio-level
        

The above mapping will most often be provided per media stream (in the media-level section(s) of SDP, i.e., after an "m=" line) or globally if there is more than one stream containing audio level indicators in a session.

上述映射通常是针对每个媒体流(在SDP的媒体级别部分,即在“m=”行之后)或全局(如果会话中有多个包含音频级别指示器的流)提供的。

Presence of the above attribute in the SDP description of a media stream indicates that RTP packets in that stream, which contain the level extension defined in this document, will be carrying such an extension with an ID of 7.

媒体流的SDP描述中存在上述属性表示该流中包含本文档中定义的级别扩展的RTP分组将携带ID为7的这种扩展。

Conferencing clients that support audio level indicators and have no mixing capabilities would not be able to provide content for this audio level extension and would hence have to always include the direction parameter in the "extmap" attribute with a value of "recvonly". Conference focus entities with mixing capabilities can omit the direction or set it to "sendrecv" in SDP offers. Such entities would need to set it to "sendonly" in SDP answers to offers with a "recvonly" parameter and to "sendrecv" when answering other "sendrecv" offers.

支持音频级别指示器且没有混音功能的会议客户端将无法为此音频级别扩展提供内容,因此必须始终在“extmap”属性中包含值为“recvonly”的方向参数。具有混合功能的会议焦点实体可以在SDP产品中省略方向或将其设置为“sendrecv”。这些实体需要在SDP回答带有“recvonly”参数的报价时将其设置为“sendonly”,在回答其他“sendrecv”报价时将其设置为“sendrecv”。

This specification only defines the use of the audio level extensions in audio streams. They MUST NOT be advertised with other media types, such as video or text, for example.

本规范仅定义音频流中音频级别扩展的使用。不得使用其他媒体类型(例如视频或文本)进行广告。

Figures 4 and 5 show two example offer/answer exchanges between a conferencing client and a focus, and between two conference focus entities.

图4和图5显示了会议客户端和焦点之间以及两个会议焦点实体之间的两个示例提供/应答交换。

SDP Offer:

SDP优惠:

       v=0
       o=alice 2890844526 2890844526 IN IP6 host.example.com
       s=-
       c=IN IP6 host.example.com
       t=0 0
       m=audio 49170 RTP/AVP 0 4
       a=rtpmap:0 PCMU/8000
       a=rtpmap:4 G723/8000
       a=extmap:1/recvonly urn:ietf:params:rtp-hdrext:csrc-audio-level
        
       v=0
       o=alice 2890844526 2890844526 IN IP6 host.example.com
       s=-
       c=IN IP6 host.example.com
       t=0 0
       m=audio 49170 RTP/AVP 0 4
       a=rtpmap:0 PCMU/8000
       a=rtpmap:4 G723/8000
       a=extmap:1/recvonly urn:ietf:params:rtp-hdrext:csrc-audio-level
        

SDP Answer:

答:

       v=0
       i=A Seminar on the session description protocol
       o=conf-focus 2890844730 2890844730 IN IP6 focus.example.net
       s=-
       c=IN IP6 focus.example.net
       t=0 0
       m=audio 52544 RTP/AVP 0
       a=rtpmap:0 PCMU/8000
       a=extmap:1/sendonly urn:ietf:params:rtp-hdrext:csrc-audio-level
        
       v=0
       i=A Seminar on the session description protocol
       o=conf-focus 2890844730 2890844730 IN IP6 focus.example.net
       s=-
       c=IN IP6 focus.example.net
       t=0 0
       m=audio 52544 RTP/AVP 0
       a=rtpmap:0 PCMU/8000
       a=extmap:1/sendonly urn:ietf:params:rtp-hdrext:csrc-audio-level
        

Figure 4: A Client-Initiated Example SDP Offer/Answer Exchange Negotiating an Audio Stream with One-Way Flow of Audio Level Information

图4:一个客户端启动的示例SDP提供/应答交换,用音频级别信息的单向流协商音频流

SDP Offer:

SDP优惠:

       v=0
       i=Un seminaire sur le protocole de description des sessions
       o=fr-focus 2890844730 2890844730 IN IP6 focus.fr.example.net
       s=-
       c=IN IP6 focus.fr.example.net
       t=0 0
       m=audio 49170 RTP/AVP 0
       a=rtpmap:0 PCMU/8000
       a=extmap:1/sendrecv urn:ietf:params:rtp-hdrext:csrc-audio-level
        
       v=0
       i=Un seminaire sur le protocole de description des sessions
       o=fr-focus 2890844730 2890844730 IN IP6 focus.fr.example.net
       s=-
       c=IN IP6 focus.fr.example.net
       t=0 0
       m=audio 49170 RTP/AVP 0
       a=rtpmap:0 PCMU/8000
       a=extmap:1/sendrecv urn:ietf:params:rtp-hdrext:csrc-audio-level
        

SDP Answer:

答:

       v=0
       i=A Seminar on the session description protocol
       o=us-focus 2890844526 2890844526 IN IP6 focus.us.example.net
       s=-
       c=IN IP6 focus.us.example.net
       t=0 0
       m=audio 52544 RTP/AVP 0
       a=rtpmap:0 PCMU/8000
       a=extmap:1/sendrecv urn:ietf:params:rtp-hdrext:csrc-audio-level
        
       v=0
       i=A Seminar on the session description protocol
       o=us-focus 2890844526 2890844526 IN IP6 focus.us.example.net
       s=-
       c=IN IP6 focus.us.example.net
       t=0 0
       m=audio 52544 RTP/AVP 0
       a=rtpmap:0 PCMU/8000
       a=extmap:1/sendrecv urn:ietf:params:rtp-hdrext:csrc-audio-level
        

Figure 5: An Example SDP Offer/Answer Exchange between Two Conference Focus Entities with Mixing Capabilities Negotiating an Audio Stream with Bidirectional Flow of Audio Level Information

图5:两个会议焦点实体之间的SDP提供/应答交换示例,具有通过音频级别信息的双向流协商音频流的混合功能

6. Security Considerations
6. 安全考虑

1. This document defines a means of attributing audio level to a particular participant in a conference. An attacker may try to modify the content of RTP packets in a way that would make audio activity from one participant appear to be coming from another participant.

1. 本文档定义了一种将音频级别分配给会议特定参与者的方法。攻击者可能试图修改RTP数据包的内容,使一个参与者的音频活动看起来像来自另一个参与者。

2. Furthermore, the fact that audio level values would not be protected even in a Secure Real-time Transport Protocol (SRTP) session [RFC3711] might be of concern in some cases where the activity of a particular participant in a conference is confidential. Also, as discussed in [SRTP-VBR-AUDIO], an attacker might be able to infer information about the conversation, possibly with phoneme-level resolution.

2. 此外,即使在安全实时传输协议(SRTP)会话[RFC3711]中,音频电平值也不会受到保护,这一事实在会议中特定参与者的活动保密的某些情况下可能会引起关注。此外,如[SRTP-VBR-AUDIO]中所述,攻击者可能能够通过音素级别的分辨率推断出有关对话的信息。

3. Both of the above are concerns that stem from the design of the RTP protocol itself, and they would probably also apply when using CSRC identifiers in the way specified in RFC 3550 [RFC3550]. It is therefore important that, according to the

3. 上述两个问题都源于RTP协议本身的设计,当以RFC 3550[RFC3550]中规定的方式使用CSC标识符时,它们可能也适用。因此,重要的是

needs of a particular scenario, implementors and deployers consider the use of header extension encryption [SRTP-ENCR-HDR] or a lower-level security and authentication mechanism such as IPsec [RFC4301], for example.

特定场景、实现者和部署者的需求考虑使用头扩展加密[SRTP EnC-HDR ]或较低级别的安全和认证机制,例如IPSec [RCF4301]。

7. IANA Considerations
7. IANA考虑

This document defines a new extension URI in the RTP Compact Header Extensions subregistry of the Real-Time Transport Protocol (RTP) Parameters registry, according to the following data:

本文档根据以下数据在实时传输协议(RTP)参数注册表的RTP Compact Header Extensions子域中定义了一个新的扩展URI:

      Extension URI: urn:ietf:params:rtp-hdrext:csrc-audio-level
      Description:   Mixer-to-client audio level indicators
      Contact:       emcho@jitsi.org
      Reference:     RFC 6465
        
      Extension URI: urn:ietf:params:rtp-hdrext:csrc-audio-level
      Description:   Mixer-to-client audio level indicators
      Contact:       emcho@jitsi.org
      Reference:     RFC 6465
        
8. Acknowledgments
8. 致谢

Lyubomir Marinov contributed level measurement and rendering code.

Lyubomir Marinov贡献了水平测量和渲染代码。

Keith Drage, Roni Even, Miguel A. Garcia, John Elwell, Kevin P. Fleming, Ingemar Johansson, Michael Ramalho, Magnus Westerlund, and several others provided helpful feedback over the avt and avtext mailing lists.

Keith Drage、Roni Even、Miguel A.Garcia、John Elwell、Kevin P.Fleming、Ingemar Johansson、Michael Ramalho、Magnus Westerlund和其他一些人对avt和avtext邮件列表提供了有用的反馈。

Jitsi's participation in this specification is funded by the NLnet Foundation.

JITSI参与这个规范是由NLNET基金会资助的。

9. References
9. 工具书类
9.1. Normative References
9.1. 规范性引用文件

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。

[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003.

[RFC3550]Schulzrinne,H.,Casner,S.,Frederick,R.,和V.Jacobson,“RTP:实时应用的传输协议”,STD 64,RFC 35502003年7月。

[RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP Header Extensions", RFC 5285, July 2008.

[RFC5285]Singer,D.和H.Desneni,“RTP标头扩展的一般机制”,RFC 5285,2008年7月。

9.2. Informative References
9.2. 资料性引用

[ITU.G711] International Telecommunication Union, "Pulse Code Modulation (PCM) of Voice Frequencies", ITU-T Recommendation G.711, November 1988.

[ITU.G711]国际电信联盟,“语音频率的脉冲编码调制(PCM)”,ITU-T建议G.711,1988年11月。

[RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002.

[RFC3261]Rosenberg,J.,Schulzrinne,H.,Camarillo,G.,Johnston,A.,Peterson,J.,Sparks,R.,Handley,M.,和E.Schooler,“SIP:会话启动协议”,RFC 3261,2002年6月。

[RFC3389] Zopf, R., "Real-time Transport Protocol (RTP) Payload for Comfort Noise (CN)", RFC 3389, September 2002.

[RFC3389]Zopf,R.,“舒适噪声(CN)的实时传输协议(RTP)有效载荷”,RFC 3389,2002年9月。

[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004.

[RFC3711]Baugher,M.,McGrew,D.,Naslund,M.,Carrara,E.,和K.Norrman,“安全实时传输协议(SRTP)”,RFC 37112004年3月。

[RFC4301] Kent, S. and K. Seo, "Security Architecture for the Internet Protocol", RFC 4301, December 2005.

[RFC4301]Kent,S.和K.Seo,“互联网协议的安全架构”,RFC 43012005年12月。

[RFC4353] Rosenberg, J., "A Framework for Conferencing with the Session Initiation Protocol (SIP)", RFC 4353, February 2006.

[RFC4353]Rosenberg,J.,“会话启动协议(SIP)会议框架”,RFC 4353,2006年2月。

[RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, Ed., "A Session Initiation Protocol (SIP) Event Package for Conference State", RFC 4575, August 2006.

[RFC4575]Rosenberg,J.,Schulzrinne,H.,和O.Levin,Ed.,“会议状态的会话启动协议(SIP)事件包”,RFC 45752006年8月。

[RFC6464] Lennox, J., Ed., Ivov, E., and E. Marocco, "A Real-time Transport Protocol (RTP) Header Extension for Client-to-Mixer Audio Level Indication", RFC 6465, December 2011.

[RFC6464]Lennox,J.,Ed.,Ivov,E.,和E.Marocco,“用于客户端到混音器音频电平指示的实时传输协议(RTP)头扩展”,RFC 6465,2011年12月。

[SRTP-ENCR-HDR] Lennox, J., "Encryption of Header Extensions in the Secure Real-Time Transport Protocol (SRTP)", Work in Progress, October 2011.

[SRTP-ENCR-HDR]Lennox,J.,“安全实时传输协议(SRTP)中报头扩展的加密”,正在进行的工作,2011年10月。

[SRTP-VBR-AUDIO] Perkins, C. and JM. Valin, "Guidelines for the use of Variable Bit Rate Audio with Secure RTP", Work in Progress, July 2011.

[SRTP-VBR-AUDIO]Perkins,C.和JM。Valin,“带安全RTP的可变比特率音频使用指南”,正在进行的工作,2011年7月。

Appendix A. Reference Implementation
附录A.参考实施

This appendix contains Java code for a reference implementation of the level calculation and rendering methods. The code is not normative and is by no means the only possible implementation. Its purpose is to help implementors add audio level support to mixers and clients.

本附录包含用于级别计算和渲染方法参考实现的Java代码。该准则不规范,决不是唯一可能的实施。其目的是帮助实现者向混音器和客户端添加音频级支持。

The Java code contains an AudioLevelCalculator class that calculates the sound pressure level of a signal with specific samples. It can be used in mixers to generate values suitable for the level extension headers.

Java代码包含一个AudioLevelCalculator类,用于计算具有特定样本的信号的声压级。它可以在混合器中用于生成适合级别扩展头的值。

The implementation is provided in Java but does not rely on any of the language specifics and can be easily ported to another language.

该实现是用Java提供的,但不依赖于任何语言细节,可以轻松地移植到另一种语言。

A.1. AudioLevelCalculator.java
A.1. AudioLevelCalculator.java

<CODE BEGINS>

<代码开始>

   /*
      Copyright (c) 2011 IETF Trust and the persons identified
      as authors of the code.  All rights reserved.
        
   /*
      Copyright (c) 2011 IETF Trust and the persons identified
      as authors of the code.  All rights reserved.
        

Redistribution and use in source and binary forms, with or without modification, is permitted pursuant to, and subject to the license terms contained in, the Simplified BSD License set forth in Section 4.c of the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info). */

根据IETF信托有关IETF文件的法律规定第4.c节规定的简化BSD许可证中包含的许可条款,允许以源代码和二进制格式重新分发和使用,无论是否修改(http://trustee.ietf.org/license-info). */

   /**
    * Calculates the audio level of specific samples of a signal
    * relative to overload.
    */
   public class AudioLevelCalculator
   {
        
   /**
    * Calculates the audio level of specific samples of a signal
    * relative to overload.
    */
   public class AudioLevelCalculator
   {
        
       /**
        * Calculates the audio level of a signal with specific
        * <tt>samples</tt>.
        *
        * @param samples  the samples whose audio level we need to
        * calculate.  The samples are specified as an <tt>int</tt>
        * array starting at <tt>offset</tt>, extending <tt>length</tt>
        * number of elements, and each <tt>int</tt> element in the
        * specified range representing a sample whose audio level we
        
       /**
        * Calculates the audio level of a signal with specific
        * <tt>samples</tt>.
        *
        * @param samples  the samples whose audio level we need to
        * calculate.  The samples are specified as an <tt>int</tt>
        * array starting at <tt>offset</tt>, extending <tt>length</tt>
        * number of elements, and each <tt>int</tt> element in the
        * specified range representing a sample whose audio level we
        
        * need to calculate.  Though a sample is provided in the
        * form of an <tt>int</tt> value, the sample size in bits
        * is determined by the caller via <tt>overload</tt>.
        *
        * @param offset  the offset in <tt>samples</tt> at which the
        * samples start.
        *
        * @param length  the length of the signal specified in
        * <tt>samples<tt>, starting at <tt>offset</tt>.
        *
        * @param overload  the overload (point) of <tt>signal</tt>.
        * For example, <tt>overload</tt> can be {@link Byte#MAX_VALUE}
        * for 8-bit signed samples or {@link Short#MAX_VALUE} for
        * 16-bit signed samples.
        *
        * @return  the audio level of the specified signal.
        */
       public static int calculateAudioLevel(
           int[] samples, int offset, int length,
           int overload)
       {
           /*
            * Calculate the root mean square (RMS) of the signal.
            */
           double rms = 0;
        
        * need to calculate.  Though a sample is provided in the
        * form of an <tt>int</tt> value, the sample size in bits
        * is determined by the caller via <tt>overload</tt>.
        *
        * @param offset  the offset in <tt>samples</tt> at which the
        * samples start.
        *
        * @param length  the length of the signal specified in
        * <tt>samples<tt>, starting at <tt>offset</tt>.
        *
        * @param overload  the overload (point) of <tt>signal</tt>.
        * For example, <tt>overload</tt> can be {@link Byte#MAX_VALUE}
        * for 8-bit signed samples or {@link Short#MAX_VALUE} for
        * 16-bit signed samples.
        *
        * @return  the audio level of the specified signal.
        */
       public static int calculateAudioLevel(
           int[] samples, int offset, int length,
           int overload)
       {
           /*
            * Calculate the root mean square (RMS) of the signal.
            */
           double rms = 0;
        
           for (; offset < length; offset++)
           {
               double sample = samples[offset];
        
           for (; offset < length; offset++)
           {
               double sample = samples[offset];
        
               sample /= overload;
               rms += sample * sample;
           }
           rms = (length == 0) ? 0 : Math.sqrt(rms / length);
        
               sample /= overload;
               rms += sample * sample;
           }
           rms = (length == 0) ? 0 : Math.sqrt(rms / length);
        
           /*
            * The audio level is a logarithmic measure of the
            * rms level of an audio sample relative to a reference
            * value and is measured in decibels.
            */
           double db;
        
           /*
            * The audio level is a logarithmic measure of the
            * rms level of an audio sample relative to a reference
            * value and is measured in decibels.
            */
           double db;
        
           /*
            * The minimum audio level permitted.
            */
           final double MIN_AUDIO_LEVEL = -127;
        
           /*
            * The minimum audio level permitted.
            */
           final double MIN_AUDIO_LEVEL = -127;
        
           /*
            * The maximum audio level permitted.
            */
           final double MAX_AUDIO_LEVEL = 0;
        
           /*
            * The maximum audio level permitted.
            */
           final double MAX_AUDIO_LEVEL = 0;
        
           if (rms > 0)
           {
               /*
                * The "zero" reference level is the overload level,
                * which corresponds to 1.0 in this calculation, because
                * the samples are normalized in calculating the RMS.
                */
               db = 20 * Math.log10(rms);
        
           if (rms > 0)
           {
               /*
                * The "zero" reference level is the overload level,
                * which corresponds to 1.0 in this calculation, because
                * the samples are normalized in calculating the RMS.
                */
               db = 20 * Math.log10(rms);
        
               /*
                * Ensure that the calculated level is within the minimum
                * and maximum range permitted.
                */
               if (db < MIN_AUDIO_LEVEL)
                   db = MIN_AUDIO_LEVEL;
               else if (db > MAX_AUDIO_LEVEL)
                   db = MAX_AUDIO_LEVEL;
           }
           else
           {
               db = MIN_AUDIO_LEVEL;
           }
        
               /*
                * Ensure that the calculated level is within the minimum
                * and maximum range permitted.
                */
               if (db < MIN_AUDIO_LEVEL)
                   db = MIN_AUDIO_LEVEL;
               else if (db > MAX_AUDIO_LEVEL)
                   db = MAX_AUDIO_LEVEL;
           }
           else
           {
               db = MIN_AUDIO_LEVEL;
           }
        
           return (int)Math.round(db);
       }
   }
        
           return (int)Math.round(db);
       }
   }
        

<CODE ENDS>

<代码结束>

Authors' Addresses

作者地址

Emil Ivov (editor) Jitsi Strasbourg 67000 France

埃米尔·伊沃夫(编辑)吉茨·斯特拉斯堡67000法国

   EMail: emcho@jitsi.org
        
   EMail: emcho@jitsi.org
        

Enrico Marocco (editor) Telecom Italia Via G. Reiss Romoli, 274 Turin 10148 Italy

Enrico Marocco(编辑)意大利电信经G.Reiss Romoli,274意大利都灵10148

   EMail: enrico.marocco@telecomitalia.it
        
   EMail: enrico.marocco@telecomitalia.it
        

Jonathan Lennox Vidyo, Inc. 433 Hackensack Avenue Seventh Floor Hackensack, NJ 07601 US

Jonathan Lennox Vidyo,Inc.美国新泽西州哈肯萨克大街433号哈肯萨克七楼,邮编:07601

   EMail: jonathan@vidyo.com
        
   EMail: jonathan@vidyo.com