Network Working Group                                        Y. Hiwasaki
Request for Comments: 5686                                     H. Ohmuro
Category: Standards Track                                NTT Corporation
                                                            October 2009
        
Network Working Group                                        Y. Hiwasaki
Request for Comments: 5686                                     H. Ohmuro
Category: Standards Track                                NTT Corporation
                                                            October 2009
        

RTP Payload Format for mU-law EMbedded Codec for Low-delay IP Communication (UEMCLIP) Speech Codec

用于低延迟IP通信(UEMCLIP)语音编解码器的mU-law嵌入式编解码器的RTP有效载荷格式

Abstract

摘要

This document describes the RTP payload format of a mU-law EMbedded Coder for Low-delay IP communication (UEMCLIP), an enhanced speech codec of ITU-T G.711. The bitstream has a scalable structure with an embedded u-law bitstream, also known as PCMU, thus providing a handy transcoding operation between narrowband and wideband speech.

本文档描述了用于低延迟IP通信的mU-law嵌入式编码器(UEMCLIP)的RTP有效载荷格式,UEMCLIP是ITU-T G.711的增强型语音编解码器。该比特流具有具有嵌入式u律比特流(也称为PCMU)的可伸缩结构,从而在窄带和宽带语音之间提供方便的转码操作。

Status of This Memo

关于下段备忘

This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.

本文件规定了互联网社区的互联网标准跟踪协议,并要求进行讨论和提出改进建议。有关本协议的标准化状态和状态,请参考当前版本的“互联网官方协议标准”(STD 1)。本备忘录的分发不受限制。

Copyright Notice

版权公告

Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved.

版权所有(c)2009 IETF信托基金和确定为文件作者的人员。版权所有。

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the BSD License.

本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括《信托法律条款》第4.e节中所述的简化BSD许可文本,并且提供BSD许可中所述的代码组件时不提供任何担保。

This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may

本文件可能包含2008年11月10日之前发布或公开的IETF文件或IETF贡献中的材料。控制某些材料版权的人员可能未授予IETF信托允许在IETF标准流程之外修改此类材料的权利。在未从控制此类材料版权的人员处获得充分许可的情况下,不得在IETF标准流程之外修改本文件,其衍生作品可能会

not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English.

不得在IETF标准流程之外创建,除非将其格式化以RFC形式发布,或将其翻译成英语以外的语言。

Table of Contents

目录

   1. Introduction ....................................................2
      1.1. Terminology ................................................3
   2. Media Format Background .........................................3
   3. Payload Format ..................................................5
      3.1. RTP Header Usage ...........................................6
      3.2. Multiple Frames in an RTP Packet ...........................6
      3.3. Payload Data ...............................................7
           3.3.1. Main Header .........................................7
           3.3.2. Sub-Layer ..........................................10
   4. Transcoding between UEMCLIP and G.711 ..........................11
   5. Congestion Control Considerations ..............................12
   6. Payload Format Parameters ......................................13
      6.1. Media Type Registration ...................................13
      6.2. Mapping to SDP Parameters .................................14
           6.2.1. Mode Specification .................................15
      6.3. Offer-Answer Model Considerations .........................16
           6.3.1. Offer-Answer Guidelines ............................16
           6.3.2. Examples ...........................................17
   7. Security Considerations ........................................19
   8. IANA Considerations ............................................19
   9. References .....................................................19
      9.1. Normative References ......................................19
      9.2. Informative References ....................................20
        
   1. Introduction ....................................................2
      1.1. Terminology ................................................3
   2. Media Format Background .........................................3
   3. Payload Format ..................................................5
      3.1. RTP Header Usage ...........................................6
      3.2. Multiple Frames in an RTP Packet ...........................6
      3.3. Payload Data ...............................................7
           3.3.1. Main Header .........................................7
           3.3.2. Sub-Layer ..........................................10
   4. Transcoding between UEMCLIP and G.711 ..........................11
   5. Congestion Control Considerations ..............................12
   6. Payload Format Parameters ......................................13
      6.1. Media Type Registration ...................................13
      6.2. Mapping to SDP Parameters .................................14
           6.2.1. Mode Specification .................................15
      6.3. Offer-Answer Model Considerations .........................16
           6.3.1. Offer-Answer Guidelines ............................16
           6.3.2. Examples ...........................................17
   7. Security Considerations ........................................19
   8. IANA Considerations ............................................19
   9. References .....................................................19
      9.1. Normative References ......................................19
      9.2. Informative References ....................................20
        
1. Introduction
1. 介绍

This document specifies the payload format for sending UEMCLIP-encoded (mU-law EMbedded Coder for Low-delay IP communication) speech using the Real-time Transport Protocol (RTP) [RFC3550]. UEMCLIP is a proprietary codec that enhances u-law ITU-T G.711 [ITU-T-G.711] and that is designed to help the market for smooth transition towards the forthcoming wideband communication environment while achieving a very small media transcoding load with the existing terminals, in which the implementation of G.711 is mandatory.

本文件规定了使用实时传输协议(RTP)[RFC3550]发送UEMCLIP编码(用于低延迟IP通信的mU-law嵌入式编码器)语音的有效载荷格式。UEMCLIP是一种专有编解码器,它增强了u-law ITU-T G.711[ITU-T-G.711],旨在帮助市场顺利过渡到即将到来的宽带通信环境,同时在现有终端上实现非常小的媒体转码负载,其中G.711的实现是强制性的。

It should be noted that, generally speaking, codecs are negotiated and changed using an SDP exchange. Also, [RFC3550] defines general RTP mixer and translator models, where media transcoding may not take place at the node. For those cases, the design concept of the embedded structure is not useful. However, there are other cases when costly transcoding is unavoidable in commonly deployed types of Multi-point Control Units (MCUs), which terminate media and RTCP

应该注意的是,一般来说,编解码器是使用SDP交换进行协商和更改的。此外,[RFC3550]定义了通用RTP混频器和转换器模型,其中媒体转码可能不会在节点上发生。在这些情况下,嵌入式结构的设计概念没有用处。然而,在通常部署的多点控制单元(MCU)类型中,当昂贵的转码不可避免时,还有其他情况,这些MCU终止媒体和RTCP

packets [RFC5117], and when narrowband and wideband terminals coexist. This embedded bitstream structure can reduce the media transcoding to a simple bitstream truncation.

数据包[RFC5117],以及窄带和宽带终端共存时。这种嵌入式比特流结构可以将媒体转码简化为简单的比特流截断。

The background and the basic idea of the media format is described in Section 2. The details of the payload format are given in Section 3. The transcoding issues with G.711 are discussed in Section 4, and the considerations for congestion control are in Section 5. In Section 6, the payload format parameters for a media type registration for UEMCLIP RTP payload format and Session Description Protocol (SDP) mappings are provided. The security considerations and IANA considerations are dealt with in Section 7 and Section 8, respectively.

第2节介绍了媒体格式的背景和基本思想。有效载荷格式的详细信息见第3节。第4节讨论了G.711的转码问题,第5节讨论了拥塞控制的注意事项。在第6节中,提供了UEMCLIP RTP有效负载格式和会话描述协议(SDP)映射的媒体类型注册的有效负载格式参数。第7节和第8节分别讨论了安全考虑和IANA考虑。

1.1. Terminology
1.1. 术语

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

本文件中的关键词“必须”、“不得”、“必需”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照[RFC2119]中所述进行解释。

2. Media Format Background
2. 媒体格式背景

UEMCLIP is an enhanced version of u-law ITU-T G.711, otherwise known as PCMU [RFC4856]. It is targeted at Voice over Internet Protocol (VoIP) applications, and its main goal is to provide a wideband communication platform that is highly interoperable with existing terminals equipped with G.711 and to stimulate the market to gradually shift to using wideband communication. In widely deployed multi-point conferencing systems, the packets usually go through RTCP-terminating (RTP Control Protocol) MCUs, "Topo-RTCP-terminating-MCU" as defined in [RFC5117]. Because the G.711 bitstream is embedded in the bitstream, costly media transcoding can be avoided in this case.

UEMCLIP是u-law ITU-T G.711的增强版,也称为PCMU[RFC4856]。它的目标是互联网语音协议(VoIP)应用,其主要目标是提供一个宽带通信平台,该平台与配备G.711的现有终端高度互操作,并刺激市场逐步转向使用宽带通信。在广泛部署的多点会议系统中,数据包通常通过RTCP终端(RTP控制协议)MCU,即[RFC5117]中定义的“Topo RTCP终端MCU”。由于G.711比特流嵌入在比特流中,因此在这种情况下可以避免昂贵的媒体转码。

This document does not discuss the implementation details of the encoder and decoder, but only describes the bitstream format.

本文档不讨论编码器和解码器的实现细节,只描述比特流格式。

Because of its scalable nature, there are a number of sub-bitstreams (sub-layer) in a UEMCLIP bitstream. By choosing appropriate sub-layers, the codec can adapt to the following requirements:

由于其可伸缩性,UEMCLIP比特流中存在许多子比特流(子层)。通过选择适当的子层,编解码器可以适应以下要求:

o Sampling frequency,

o 采样频率,

o Number of channels,

o 频道数量,

o Speech quality, and

o 语音质量,以及

o Bit-rate.

o 比特率。

The UEMCLIP codec operates at a 20-ms frame, and includes three sub-coders as shown in Table 1. The core layer is u-law G.711 at 64 kbit/s, and other two are quality and bandwidth enhancement layers with bit-rate of 16 kbit/s each.

UEMCLIP编解码器以20ms帧运行,包括三个子编码器,如表1所示。核心层是64 kbit/s的u-law G.711,另外两层是质量和带宽增强层,每层的比特率为16 kbit/s。

   +-------+---------------------+----------+--------------------------+
   | Layer | Description         | Bit-rate | Coding algorithm         |
   +-------+---------------------+----------+--------------------------+
   |   a   | G.711 core          |       64 | u-law PCM                |
   |       |                     |          |                          |
   |   b   | Lower-band          |       16 | Time domain block        |
   |       | enhancement         |          | quantization             |
   |       |                     |          |                          |
   |   c   | Higher-band         |       16 | MDCT block quantization  |
   +-------+---------------------+----------+--------------------------+
        
   +-------+---------------------+----------+--------------------------+
   | Layer | Description         | Bit-rate | Coding algorithm         |
   +-------+---------------------+----------+--------------------------+
   |   a   | G.711 core          |       64 | u-law PCM                |
   |       |                     |          |                          |
   |   b   | Lower-band          |       16 | Time domain block        |
   |       | enhancement         |          | quantization             |
   |       |                     |          |                          |
   |   c   | Higher-band         |       16 | MDCT block quantization  |
   +-------+---------------------+----------+--------------------------+
        

Table 1: Sub-Layer Description

表1:子层描述

Based on these sub-layers, the UEMCLIP codec operates in four modes as shown in Table 2. Here, "Ch" is the number of channels and "Fs" is the sampling frequency in kHz. It should be noted that the current version only supports single-channel operation and there might be future extensions with multi-channel capabilities. The absent Modes 2 and 5 are reserved for possible future extension to 32 kHz sampling modes. As the mode definition is expected to grow, any other modes not defined in this table MUST NOT be used for compatibility and interoperability reasons.

基于这些子层,UEMCLIP编解码器以四种模式运行,如表2所示。这里,“Ch”是通道数,“Fs”是以kHz为单位的采样频率。需要注意的是,当前版本仅支持单通道操作,未来可能会有多通道功能的扩展。不存在的模式2和5保留用于将来可能扩展到32 kHz采样模式。随着模式定义的增长,出于兼容性和互操作性的原因,不得使用本表中未定义的任何其他模式。

   +------+----+----+-------+-------+-------+-------------+------------+
   | Mode | Ch | Fs | Layer | Layer | Layer |    Bit-rate |      Total |
   |      |    |    |   a   |   b   |   c   | w/o headers |   bit-rate |
   |      |    |    |       |       |       |    [kbit/s] |   [kbit/s] |
   +------+----+----+-------+-------+-------+-------------+------------+
   |   0  |  1 |  8 |   x   |   -   |   -   |          64 |       67.2 |
   |      |    |    |       |       |       |             |            |
   |   1  |  1 | 16 |   x   |   -   |   x   |          80 |       84.0 |
   |      |    |    |       |       |       |             |            |
   |   2  |  - |  - |   -   |   -   |   -   |           - |          - |
   |      |    |    |       |       |       |             |            |
   |   3  |  1 |  8 |   x   |   x   |   -   |          80 |       84.0 |
   |      |    |    |       |       |       |             |            |
   |   4  |  1 | 16 |   x   |   x   |   x   |          96 |      100.8 |
   |      |    |    |       |       |       |             |            |
   |   5  |  - |  - |   -   |   -   |   -   |           - |          - |
   +------+----+----+-------+-------+-------+-------------+------------+
        
   +------+----+----+-------+-------+-------+-------------+------------+
   | Mode | Ch | Fs | Layer | Layer | Layer |    Bit-rate |      Total |
   |      |    |    |   a   |   b   |   c   | w/o headers |   bit-rate |
   |      |    |    |       |       |       |    [kbit/s] |   [kbit/s] |
   +------+----+----+-------+-------+-------+-------------+------------+
   |   0  |  1 |  8 |   x   |   -   |   -   |          64 |       67.2 |
   |      |    |    |       |       |       |             |            |
   |   1  |  1 | 16 |   x   |   -   |   x   |          80 |       84.0 |
   |      |    |    |       |       |       |             |            |
   |   2  |  - |  - |   -   |   -   |   -   |           - |          - |
   |      |    |    |       |       |       |             |            |
   |   3  |  1 |  8 |   x   |   x   |   -   |          80 |       84.0 |
   |      |    |    |       |       |       |             |            |
   |   4  |  1 | 16 |   x   |   x   |   x   |          96 |      100.8 |
   |      |    |    |       |       |       |             |            |
   |   5  |  - |  - |   -   |   -   |   -   |           - |          - |
   +------+----+----+-------+-------+-------+-------------+------------+
        

Table 2: Mode Description

表2:模式说明

The UEMCLIP bitstream contains internal headers and other side-information apart from the layer data. This results in total bit-rate larger than the sum of the layers shown in the above table. The detail of the internal headers and auxiliary information are described in Section 3.3.1.

UEMCLIP比特流包含内部头和层数据之外的其他边信息。这导致总比特率大于上表所示的层总和。第3.3.1节描述了内部标题和辅助信息的详细信息。

Defining the sampling frequency and the number of channels does not result in a singular mode, i.e., there can be multiple modes for the same sampling frequency or number of channels. The supported modes would differ between implementations; thus, the sender and the receiver must negotiate what mode to use for transmission.

定义采样频率和通道数量不会导致单一模式,即,对于相同的采样频率或通道数量,可能存在多个模式。不同的实现方式支持的模式不同;因此,发送方和接收方必须协商使用何种传输模式。

3. Payload Format
3. 有效载荷格式

As an RTP payload, the UEMCLIP bitstream can contain one or more frames as shown in Figure 1.

作为RTP有效负载,UEMCLIP比特流可以包含一个或多个帧,如图1所示。

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      RTP Header                               |
    +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
    |                                                               |
    |                 one or more frames of UEMCLIP                 |
    |                                                               |
    +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
        
     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      RTP Header                               |
    +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
    |                                                               |
    |                 one or more frames of UEMCLIP                 |
    |                                                               |
    +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
        

Figure 1: RTP Payload Format

图1:RTP有效负载格式

The UEMCLIP bitstream has a scalable structure; thus, it is possible to reconstruct the signal by decoding a part of it. A UEMCLIP frame is composed of a main header (MH) followed by one or more (up to three) sub-layers (SLs) as shown in Figure 2.

UEMCLIP比特流具有可伸缩结构;因此,可以通过解码信号的一部分来重构信号。UEMCLIP帧由一个主头部(MH)和一个或多个(最多三个)子层(SLs)组成,如图2所示。

                            +--+-------+//-+
                            |MH| SL #1 |...|
                            +--+-------+//-+
        
                            +--+-------+//-+
                            |MH| SL #1 |...|
                            +--+-------+//-+
        

Figure 2: A UEMCLIP Frame (Bitstream Format)

图2:UEMCLIP帧(比特流格式)

As a sub-layer, the core layer, i.e., "Layer a", MUST always be included. It should be noted that the location of the core layer may or may not immediately follow MH field. The decoder MUST always refer to the layer indices for proper decoding because the order of the sub-layers is arbitrary.

作为子层,必须始终包括核心层,即“层a”。应注意的是,核心层的位置可能会或可能不会立即跟随MH场。解码器必须始终参考层索引进行正确解码,因为子层的顺序是任意的。

The UEMCLIP bitstream does not explicitly include the following information: mode and sampling frequency (Fs). As described before, this information MUST be exchanged while establishing a connection, for example, by means of SDP.

UEMCLIP比特流不明确包括以下信息:模式和采样频率(Fs)。如前所述,在建立连接时,必须交换该信息,例如,通过SDP。

3.1. RTP Header Usage
3.1. RTP头使用

Each RTP packet starts with a fixed RTP header, as explained in [RFC3550]. The following fields of the RTP fixed header used specifically for UEMCLIP streams are emphasized:

每个RTP数据包都以一个固定的RTP报头开始,如[RFC3550]中所述。强调了专门用于UEMCLIP流的RTP固定报头的以下字段:

Payload type: The assignment of an RTP payload type for this packet format is outside the scope of this document; however, it is expected that a payload type in the dynamic range shall be assigned.

有效负载类型:此数据包格式的RTP有效负载类型的分配超出了本文档的范围;但是,预计应指定动态范围内的有效载荷类型。

Timestamp: This encodes the sampling instant of the first speech signal sample in the RTP data packet. For UEMCLIP streams, the RTP timestamp MUST advance based on a clock either at 8000 or 16000 (Hz). In cases where the audio sampling rate can change during a session, the RTP timestamp rate MUST be equal to the maximum rate (in Hz) given in the mode range (see Section 6.2.1). This implies that the RTP timestamp rate for UEMCLIP payload type MUST NOT change during a session. For example, for a UEMCLIP stream with 8-kHz audio sampling, where a transition to a 16-kHz audio sampling mode is allowed, the RTP time stamp must always advance using the 16-kHz clock rate. For a fixed audio sampling mode, the RTP timestamp rate should be either 8 or 16 kHz, depending on the sampling rate.

时间戳:对RTP数据包中第一个语音信号样本的采样瞬间进行编码。对于UEMCLIP流,RTP时间戳必须基于8000或16000(Hz)的时钟提前。如果音频采样率在会话期间可能发生变化,RTP时间戳速率必须等于模式范围内给定的最大速率(单位:Hz)(见第6.2.1节)。这意味着UEMCLIP有效负载类型的RTP时间戳速率在会话期间不得更改。例如,对于具有8-kHz音频采样的UEMCLIP流,其中允许转换到16 kHz音频采样模式,RTP时间戳必须始终使用16 kHz时钟速率前进。对于固定音频采样模式,RTP时间戳速率应为8或16 kHz,具体取决于采样速率。

Marker bit: If the codec is used for applications with discontinuous transmission (DTX, or silence compression), the first packet after a silence period during which packets have not been transmitted contiguously SHOULD have the marker bit in the RTP data header set to one. The marker bit in all other packets MUST be zero. Applications without DTX MUST set the marker bit to zero.

标记位:如果编解码器用于具有不连续传输(DTX或静默压缩)的应用程序,则在静默期之后的第一个数据包(在此期间数据包未连续传输)应将RTP数据头中的标记位设置为1。所有其他数据包中的标记位必须为零。没有DTX的应用程序必须将标记位设置为零。

3.2. Multiple Frames in an RTP Packet
3.2. RTP数据包中的多个帧

More than one UEMCLIP frame may be included in a single RTP packet by a sender. However, senders have the following additional restrictions:

发送方可以在单个RTP分组中包括多个UEMCLIP帧。但是,发件人有以下附加限制:

o A single RTP packet SHOULD NOT include more UEMCLIP frames than will fit in the path MTU.

o 单个RTP数据包不应包含比路径MTU中适合的UEMCLIP帧更多的UEMCLIP帧。

o All frames contained in a single RTP packet MUST be of the same mode.

o 单个RTP数据包中包含的所有帧必须具有相同的模式。

o Frames MUST NOT be split between RTP packets.

o 帧不能在RTP数据包之间分割。

It is RECOMMENDED that the number of frames contained within an RTP packet be consistent with the application. Since UEMCLIP is designed for telephony applications where delay has a great impact on the quality, then fewer frames per packet for lower delay, is preferable.

建议RTP数据包中包含的帧数与应用程序一致。由于UEMCLIP是为延迟对质量有很大影响的电话应用而设计的,因此每个数据包的帧数越少,延迟越低越好。

3.3. Payload Data
3.3. 有效载荷数据

In a UEMCLIP bitstream, all numbers are encoded in a network byte order.

在UEMCLIP比特流中,所有数字都按网络字节顺序编码。

3.3.1. Main Header
3.3.1. 主割台

The main header (MH) is placed at the top of a frame and has a size of 6 bytes. The content of the main header is shown in Figure 3.

主标头(MH)位于帧的顶部,大小为6字节。主标题的内容如图3所示。

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      MX       |                      PC                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          PC(cont'd)           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      MX       |                      PC                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          PC(cont'd)           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Figure 3: UEMCLIP Main Header Format (MH)

图3:UEMCLIP主标题格式(MH)

Mixing information (MX): 8 bits

混合信息(MX):8位

Mixing information field. This field is only relevant when Topo-RTCP-terminating-MCUs are utilized to interpret these fields. See Section 3.3.1.1 for details of the fields.

混合信息字段。仅当使用Topo RTCP终端MCU解释这些字段时,此字段才相关。有关字段的详细信息,请参见第3.3.1.1节。

Packet-loss Concealment information (PC): 40 bits

丢包隐藏信息(PC):40位

Packet-loss concealment (PLC) information field. See Section 3.3.1.2.

数据包丢失隐藏(PLC)信息字段。见第3.3.1.2节。

3.3.1.1. Mixing Information Field
3.3.1.1. 混合信息场
                            0 1 2 3 4 5 6 7
                           +-+-+-+-+-+-+-+-+
                           |C|R|V|   PW1   |
                           |1|1|1|         |
                           +-+-+-+-+-+-+-+-+
        
                            0 1 2 3 4 5 6 7
                           +-+-+-+-+-+-+-+-+
                           |C|R|V|   PW1   |
                           |1|1|1|         |
                           +-+-+-+-+-+-+-+-+
        

Figure 4: Mixing Information Field (MX)

图4:混合信息字段(MX)

Check bit #1 (C1): 1 bit

校验位#1(C1):1位

Validity flag of V1 and PW1. This bit being "1" indicates that both parameters are valid, and "0" indicates that the parameters should be ignored. If any of these parameters is invalid, this bit should be set to "0". This flag is mainly intended for a UEMCLIP-conscious Topo-RTCP-terminating-MCU. This flag should be set to "0" in case of upward transcoding from G.711 (see Section 4).

V1和PW1的有效性标志。该位为“1”表示两个参数都有效,“0”表示应忽略这些参数。如果这些参数中的任何一个无效,则该位应设置为“0”。该标志主要用于有UEMCLIP意识的Topo RTCP终端MCU。从G.711向上转码时,该标志应设置为“0”(见第4节)。

Reserved bit #1 (R1): 1 bit

保留位#1(R1):1位

This bit should be ignored. The default of this bit is 0.

这一位应该被忽略。此位的默认值为0。

VAD flag #1 (V1): 1 bit

VAD标志#1(V1):1位

Voice activity detection flag of the current frame, designed to be used for MCU operations. This flag being "1" indicates that the frame is an active (voice) segment, and "0" indicates that it is an inactive (non-voice) or a silent segment. This flag is specifically designed for mixing information. DTX judgment based this flag is not recommended.

当前帧的语音活动检测标志,设计用于MCU操作。该标志为“1”表示该帧为活动(语音)段,“0”表示该帧为非活动(非语音)段或静默段。此标志专门用于混合信息。不建议基于此标志的DTX判断。

Power #1 (PW1): 5 bits

功率#1(PW1):5位

      Signal power code of the current frame.  The code is obtained by
      calculating a root mean square (RMS) of "Layer a" and encoding
      this RMS using G.711 u-law [ITU-T-G.711].  Denoting the encoded
      RMS as R, then PW1 is obtained by PW1 = ((~R)>>2) & 0x1F, where
      "~", ">>", "&" are one's complement arithmetic, right SHIFT, and
      bitwise AND operators, respectively.
        
      Signal power code of the current frame.  The code is obtained by
      calculating a root mean square (RMS) of "Layer a" and encoding
      this RMS using G.711 u-law [ITU-T-G.711].  Denoting the encoded
      RMS as R, then PW1 is obtained by PW1 = ((~R)>>2) & 0x1F, where
      "~", ">>", "&" are one's complement arithmetic, right SHIFT, and
      bitwise AND operators, respectively.
        
3.3.1.2. PLC Information Field
3.3.1.2. PLC信息字段
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |C|R2 |V|   K   |U|     P1      |U|     P2      |      PW2      |
   |2|   |2|       |1|             |2|             |               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      R3       |
   |               |
   +-+-+-+-+-+-+-+-+
        
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |C|R2 |V|   K   |U|     P1      |U|     P2      |      PW2      |
   |2|   |2|       |1|             |2|             |               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      R3       |
   |               |
   +-+-+-+-+-+-+-+-+
        

Figure 5: PLC Information Field (PC)

图5:PLC信息字段(PC)

Check bit #2 (C2): 1 bit

校验位#2(C2):1位

Validity flag of V2, K, U1, P1, U2, P2, and PW2. If the flag is "1", it means that all these parameters are valid, and "0" means that the parameters should be ignored. If any of these parameters is invalid, this bit should be set to "0". Similarly to C1, this flag should be set to "0" in case of upward transcoding from G.711 (see Section 4).

V2、K、U1、P1、U2、P2和PW2的有效性标志。如果标志为“1”,则表示所有这些参数都有效,“0”表示应忽略这些参数。如果这些参数中的任何一个无效,则该位应设置为“0”。与C1类似,在G.711向上转码的情况下,该标志应设置为“0”(见第4节)。

Reserved bit #2 (R2): 2 bits

保留位#2(R2):2位

These bits should be ignored. The default of these bits are 0.

这些位应该被忽略。这些位的默认值为0。

VAD flag #2 (V2): 1 bit

VAD标志#2(V2):1位

Voice activity detection flag of the current frame, designed to be used for packet-loss concealment. This might not be the same as V1 in the mixing information, and might not be synchronous to the marker bit in the RTP header. DTX judgment based this flag is not recommended.

当前帧的语音活动检测标志,设计用于包丢失隐藏。这可能与混合信息中的V1不同,也可能与RTP报头中的标记位不同步。不建议基于此标志的DTX判断。

Frame indicator (K): 4 bits

帧指示符(K):4位

This value indicates the frame offset of U2, P2, and PW2. Since it is a better idea to carry the speech feature parameters as PLC information in a different frame to maintain the speech quality, this frame offset value gives with which frame the parameters are to be associated. The value ranges between "0" and "15". If the current frame number is N, for example, the value K indicates that U2, P2, and PW2 are associated with the frame of N-K. The frame indicator is equal to the difference in the RTP sequence number when one UEMCLIP frame is contained in a single RTP packet.

该值表示U2、P2和PW2的帧偏移。由于在不同的帧中携带语音特征参数作为PLC信息以保持语音质量是一个更好的主意,因此该帧偏移值给出了参数将与哪个帧相关联。该值的范围介于“0”和“15”之间。例如,如果当前帧编号为N,则值K指示U2、P2和PW2与N-K的帧相关联。当单个RTP分组中包含一个UEMCLIP帧时,帧指示符等于RTP序列号中的差值。

   V/UV flag #1 (U1):  1 bit
        
   V/UV flag #1 (U1):  1 bit
        

Voiced/Unvoiced signal indicator of the current frame. This flag being "0" indicates that the frame is a voiced signal segment, and "1" indicates that it is an unvoiced signal segment.

当前帧的浊音/清音信号指示器。该标志为“0”表示该帧为浊音信号段,“1”表示该帧为清音信号段。

Pitch lag #1 (P1): 7 bits

基音滞后#1(P1):7位

Pitch code of the current frame. The actual pitch lag is calculated as P1+20 samples in 8-kHz sampling rate. Pitch lag must be 20 <= pitch length <= 120. Codes ranging between "0x65" and "0x7F" are not used. To obtain the pitch lag, any pitch estimation method can be used, such as the one used in G.711 Appendix I [ITU-T-G.711Appendix1].

当前帧的基音代码。实际变桨滞后计算为P1+20个采样,采样率为8-kHz。节距滞后必须为20<=节距长度<=120。不使用介于“0x65”和“0x7F”之间的代码。为了获得基音滞后,可以使用任何基音估计方法,例如G.711附录I[ITU-T-G.711附录X1]中使用的方法。

   V/UV flag #2 (U2):  1 bit
        
   V/UV flag #2 (U2):  1 bit
        

Voiced/Unvoiced signal indicator of the offset frame. This flag being "0" indicates that the frame is a voiced signal segment, and "1" indicates that it is an unvoiced signal segment. The offset value is defined as K.

偏移帧的浊音/清音信号指示器。该标志为“0”表示该帧为浊音信号段,“1”表示该帧为清音信号段。偏移值定义为K。

Pitch lag #2 (P2): 7 bits

基音滞后#2(P2):7位

Pitch code of the offset frame. The offset value is defined as K. The calculation method is identical to "P1", except that it is based on the signal of offset frame.

偏移帧的节距代码。偏移值定义为K。计算方法与“P1”相同,只是它基于偏移帧的信号。

Power #2 (PW2): 8 bits

功率#2(PW2):8位

Signal power code of the offset frame. The offset value is defined as K.

偏移帧的信号功率代码。偏移值定义为K。

Reserved bits #3 (R3): 8 bits

保留位#3(R3):8位

These bits should be ignored. The default of all bits are "0".

这些位应该被忽略。所有位的默认值为“0”。

3.3.2. Sub-Layer
3.3.2. 子层

Sub-layer (SL) is a sub-header followed by layer bitstreams, as shown in Figure 6. The sub-header indicates the layer location and the number of bytes.

子层(SL)是一个子报头,后面跟着层比特流,如图6所示。子标题指示层位置和字节数。

     0                   1                   2
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7   . . .
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+//-+-+-+
    |CI |FI |QI |R4 |      SB       |               LD         ...  |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+//-+-+-+
        
     0                   1                   2
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7   . . .
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+//-+-+-+
    |CI |FI |QI |R4 |      SB       |               LD         ...  |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+//-+-+-+
        

Figure 6: Sub-Layer Format (SL)

图6:子层格式(SL)

Channel index (CI): 2 bits

信道索引(CI):2位

Indicates the channel number. For all modes given in Table 2, this should be "0". The detail is given in Table 3.

指示频道号。对于表2中给出的所有模式,该值应为“0”。详情见表3。

Frequency index (FI): 2 bits

频率索引(FI):2位

Indicates the frequency number. "0" means that the layer is in the base frequency band, higher number means that the layer is in respective frequency band. The detail is given in Table 3.

指示频率编号。“0”表示层在基频带中,数字越大表示层在各自的频带中。详情见表3。

Quality index (QI): 2 bits

质量指数(QI):2位

Indicates the quality layer number. "0" means that the layer is in the base layer, and higher number means that the layer is in respective quality layer. The detail is given in Table 3.

指示质量层编号。“0”表示该层位于基础层中,数字越大表示该层位于相应的质量层中。详情见表3。

Reserved #4 (R4): 2 bits

保留#4(R4):2位

Not used (reserved). The default value is "0".

未使用(保留)。默认值为“0”。

Sub-layer Size (SB): 8 bits

子层大小(SB):8位

Indicates the byte size of the following sub-layer data.

指示以下子层数据的字节大小。

Layer Data (LD): SB*8 bits

层数据(LD):SB*8位

The actual sub-layer data.

实际的子层数据。

For all the layers shown in Table 1, the layer indices are shown in Table 3.

对于表1所示的所有图层,图层索引如表3所示。

                         +-------+----+----+----+
                         | Layer | CI | FI | QI |
                         +-------+----+----+----+
                         |   a   |  0 |  0 |  0 |
                         |       |    |    |    |
                         |   b   |  0 |  0 |  1 |
                         |       |    |    |    |
                         |   c   |  0 |  1 |  0 |
                         +-------+----+----+----+
        
                         +-------+----+----+----+
                         | Layer | CI | FI | QI |
                         +-------+----+----+----+
                         |   a   |  0 |  0 |  0 |
                         |       |    |    |    |
                         |   b   |  0 |  0 |  1 |
                         |       |    |    |    |
                         |   c   |  0 |  1 |  0 |
                         +-------+----+----+----+
        

Table 3: Layer Indices

表3:层指数

4. Transcoding between UEMCLIP and G.711
4. UEMCLIP和G.711之间的转码

As given in Section 2, the u-law-encoded G.711 bitstream (Layer a) is the core layer of a UEMCLIP bitstream, and is always embedded. This means that media transcoding from the UEMCLIP bitstream to G.711 does not have to undergo decoding and re-encoding procedures, but simple extraction would suffice. However, this does not apply for the reverse procedure, i.e., transcoding from G.711 to UEMCLIP, because the auxiliary information in the main header (MH) must be assigned separately. It should be noted that this media transcoding is useful for a Media Translator (Topo-Media-Translator) or a Point-to-Multipoint Using RTCP Terminating MCU (Topo-RTCP-terminating-MCU) in [RFC5117], and all the requirements apply. This means that a transcoding device of this sort MUST rewrite RTCP packets, together with the RTP media packets.

如第2节所述,u-law编码的G.711比特流(层a)是UEMCLIP比特流的核心层,并且总是嵌入其中。这意味着从UEMCLIP比特流到G.711的媒体转码不必经历解码和重新编码过程,但简单的提取就足够了。然而,这不适用于反向程序,即从G.711到UEMCLIP的代码转换,因为必须单独分配主报头(MH)中的辅助信息。应注意的是,此媒体转码对于[RFC5117]中使用RTCP终端MCU(Topo RTCP终端MCU)的媒体转换器(Topo媒体转换器)或点对多点非常有用,所有要求均适用。这意味着此类转码设备必须重写RTCP数据包以及RTP媒体数据包。

The transcoding from UEMCLIP to u-law G.711 can be done easily by finding an appropriate sub-layer. Within a frame, the transcoder should look for a sub-layer with a layer index of "0x00", and subsequent LD that has a size of SB*8 bits (UEMCLIP has a 20-ms frame thus, SB=160) are the actual G.711 bitstream data. It should be noted that the transcoder should not always expect the core layer to be located right after the main header.

从UEMCLIP到u-law G.711的转码可以通过找到合适的子层轻松完成。在帧内,转码器应查找层索引为“0x00”的子层,并且具有SB×8比特大小的后续LD(UEMCLIP具有20ms帧,因此,SB=160)是实际的G.711比特流数据。应该注意的是,转码器不应该总是期望核心层正好位于主报头之后。

On the other hand, the transcoding from G.711 to UEMCLIP is not entirely straightforward. Since there are no means to generate enhancement sub-layers, a G.711 bitstream can only be converted to UEMCLIP Mode 0 bitstream. If the original G.711 bitstream is encoded in A-law, it should first be converted to u-law to become the core layer. Because a UEMCLIP frame size is 20 ms, a u-law-encoded G.711 bitstream MUST be a 160-sample chunk to become a core layer. For the main header contents, when the UEMCLIP encoder is not available, it should follow these guidelines:

另一方面,从G.711到UEMCLIP的代码转换并不完全直接。由于无法生成增强子层,因此G.711比特流只能转换为UEMCLIP模式0比特流。如果原始G.711比特流以A-法则编码,则应首先将其转换为u-法则以成为核心层。由于UEMCLIP帧大小为20ms,因此u-law编码的G.711比特流必须是160个样本块才能成为核心层。对于主标题内容,当UEMCLIP编码器不可用时,应遵循以下准则:

o The check bits for mixing and PLC (C1 and C2) are set to 0.

o 混合和PLC(C1和C2)的检查位设置为0。

o The reserved bits (R1 to R3) in MH are set to respective default values.

o MH中的保留位(R1到R3)被设置为各自的默认值。

For the core layer (i.e., u-law G.711 bitstream), it should have the following sub-layer header:

对于核心层(即u-law G.711比特流),其应具有以下子层头:

o All CI, FI, QI, and R4 MUST be 0.

o 所有CI、FI、QI和R4必须为0。

o Sub-layer size (SB) MUST be 160 for a 20-ms frame.

o 对于20毫秒帧,子层大小(SB)必须为160。

5. Congestion Control Considerations
5. 拥塞控制考虑因素

The general congestion control considerations for transporting RTP data also apply to UEMCLIP over RTP [RFC3550] as well as any applicable RTP profile like Audio-Visual Profile (AVP) [RFC3551].

传输RTP数据的一般拥塞控制注意事项也适用于UEMCLIP over RTP[RFC3550]以及任何适用的RTP配置文件,如音频视频配置文件(AVP)[RFC3551]。

The bandwidth of a UEMCLIP bitstream can be reduced by changing to lower-bit-rate modes. The embedded layer structure of UEMCLIP may help to control congestion, when dynamic mode changing (see Section 6.2.1) is available, and the range of modes is obtained by offer-answer negotiation as given in Section 6.3. It should be noted that this involves proper RTCP handling when the bit-rate is modified in an RTP translator or a mixer [RFC3550].

UEMCLIP比特流的带宽可以通过改变到较低的比特率模式来降低。当动态模式改变(见第6.2.1节)可用时,UEMCLIP的嵌入式层结构可能有助于控制拥塞,模式范围通过第6.3节给出的报价-应答协商获得。应注意,当在RTP转换器或混频器[RFC3550]中修改比特率时,这涉及到适当的RTCP处理。

Packing more frames in each RTP payload can reduce the number of packets sent, and hence the overhead from IP/UDP/RTP headers, at the expense of increased delay and reduced error robustness against packet losses. It should be treated with care because increased delay means reduced quality.

在每个RTP有效负载中打包更多帧可以减少发送的数据包数量,从而减少IP/UDP/RTP报头的开销,但代价是增加延迟并降低对数据包丢失的错误鲁棒性。应小心处理,因为延迟增加意味着质量降低。

6. Payload Format Parameters
6. 有效载荷格式参数
6.1. Media Type Registration
6.1. 媒体类型注册

This registration is done using the template defined in [RFC4288] and following [RFC4855].

此注册使用[RFC4288]中定义的模板和以下[RFC4855]完成。

Media type name: audio

媒体类型名称:音频

Media subtype name: UEMCLIP

媒体子类型名称:UEMCLIP

Required parameters:

所需参数:

Rate: Defines the sampling rate, and it MUST be either 8000 or 16000. See Section 6.2.1 "Mode specification" of RFC 5686 (this RFC) for details.

速率:定义采样速率,必须为8000或16000。详见RFC 5686(本RFC)第6.2.1节“模式规范”。

Optional parameters:

可选参数:

ptime: See RFC 4566 [RFC4566].

ptime:见RFC 4566[RFC4566]。

maxptime: See RFC 4566 [RFC4566].

maxptime:请参阅RFC 4566[RFC4566]。

mode: Indicates the range of dynamically changeable modes during a session. Possible values are a comma-separated list of modes from the supported mode set: 0, 1, 3, and 4. If only one mode is specified, it means that the mode must not be changed during the session. When not specified, the mode transmission defaults to a singular mode as specified in Table 4. See Section 6.2.1 "Mode specification" of RFC 5686 (this RFC) for details.

模式:指示会话期间动态可更改模式的范围。可能的值是受支持的模式集0、1、3和4中以逗号分隔的模式列表。如果只指定了一种模式,则表示在会话期间不得更改该模式。未指定时,模式传输默认为表4中规定的单一模式。详见RFC 5686(本RFC)第6.2.1节“模式规范”。

Encoding considerations: This media type is framed and contains binary data. See Section 4.8 of RFC 4288.

编码注意事项:此媒体类型是框架式的,包含二进制数据。参见RFC 4288第4.8节。

Security considerations: See Section 7 "Security Considerations" of RFC 5686 (this RFC).

安全注意事项:参见RFC 5686(本RFC)第7节“安全注意事项”。

Interoperability considerations: This media may be readily transcoded to u-law-encoded ITU-T G.711. See Section 4 "Transcoding between UEMCLIP and G.711" of RFC 5686 (this RFC).

互操作性注意事项:该介质可容易地转码为u-law编码的ITU-T G.711。参见RFC 5686(本RFC)第4节“UEMCLIP和G.711之间的代码转换”。

Published specification: RFC 5686 (this RFC)

已发布规范:RFC 5686(本RFC)

Applications that use this media type: Audio and video streaming and conferencing tools.

使用此媒体类型的应用程序:音频和视频流媒体以及会议工具。

Additional information: None

其他信息:无

Intended usage: COMMON

预期用途:普通

Restrictions on usage: This media type depends on RTP framing, and hence is only defined for transfer via RTP.

使用限制:此媒体类型取决于RTP帧,因此仅定义为通过RTP传输。

   Person & email address to contact for further information:
      Yusuke Hiwasaki <hiwasaki.yusuke@lab.ntt.co.jp>
        
   Person & email address to contact for further information:
      Yusuke Hiwasaki <hiwasaki.yusuke@lab.ntt.co.jp>
        

Author: Yusuke Hiwasaki

作者:Yusuke Hiwasaki

Change Controller: IETF Audio/Video Transport Working Group delegated from the IESG

变更控制员:IESG授权的IETF音频/视频传输工作组

6.2. Mapping to SDP Parameters
6.2. 映射到SDP参数

The media types audio/UEMCLIP are mapped to fields in the Session Description Protocol (SDP) [RFC4566] as follows:

媒体类型audio/UEMCLIP映射到会话描述协议(SDP)[RFC4566]中的字段,如下所示:

Media name: The "m=" line of SDP MUST be audio.

媒体名称:SDP的“m=”行必须是音频。

Encoding name: Registered media subtype name should be used for the "a=rtpmap" line.

编码名称:“a=rtpmap”行应使用注册的媒体子类型名称。

Sampling Frequency: Depending on the mode, clock rate (sampling frequency) specified in "a=rtpmap" MUST be selected from the ones defined in Table 2. See Section 6.2.1 for details.

采样频率:根据模式,“a=rtpmap”中指定的时钟频率(采样频率)必须从表2中定义的时钟频率中选择。详见第6.2.1节。

Encoding parameters: Since this is an audio stream, the encoding parameters indicate the number of audio channels, and this SHOULD default to "1", as selected from the ones defined in Table 2. This is OPTIONAL.

编码参数:由于这是一个音频流,编码参数表示音频通道的数量,默认为“1”,从表2中定义的通道中选择。这是可选的。

Packet time: A frame length of any UEMCLIP is 20 ms, thus the argument of "a=ptime" SHOULD be a multiple of "20". When not listed in SDP, it should also default to the minimum size: "20".

数据包时间:任何UEMCLIP的帧长度都是20毫秒,因此“A=ptime”的参数应该是“20”的倍数。当未在SDP中列出时,它还应默认为最小大小:“20”。

UMECLIP specific: Any description specific to UEMCLIP is defined in the Format Specification Parameters ("a=fmtp"). Each parameter MUST be separated with ";", and if any attribute (value) exists, it MUST be defined with "=". For compatibility reasons, any application/terminal MUST ignore any parameters that it does not

UMECLIP特定:任何特定于UEMCLIP的描述都在格式规范参数(“a=fmtp”)中定义。每个参数必须用“;”分隔,如果存在任何属性(值),则必须用“=”定义。出于兼容性原因,任何应用程序/终端都必须忽略它没有忽略的任何参数

understand. This is to ensure the upper-compatibility with parameters added in future enhancements. The mode specification should be made here (see Section 6.2.1).

懂这是为了确保上层与未来增强中添加的参数兼容。应在此制定模式规范(见第6.2.1节)。

6.2.1. Mode Specification
6.2.1. 模式规范

Since UEMCLIP codec can operate in number of modes (bit-rates), it is desirable to specify the range of modes at which an encoder or a decoder can operate. When exchanging SDP messages, an offerer should specify all possible combinations of mode numbers as arguments to "mode=" in "a=fmtp" line, delimited by commas ",". In case of specifying multiple modes, those SHOULD appear in the descending priority order.

由于UEMCLIP编解码器可以在多个模式(比特率)下工作,因此需要指定编码器或解码器可以工作的模式范围。交换SDP消息时,报价人应将所有可能的模式编号组合指定为“mode=”in“a=fmtp”行的参数,以逗号“,”分隔。如果指定多个模式,则这些模式应按优先级降序显示。

Although UEMCLIP decoders SHOULD accept bitstreams in any modes, an implementation may fail to adapt to the dynamic mode changes during a session. For this reason, an application may choose to operate either with one fixed mode or with multiple modes that can be dynamically changed. If the mode is to be fixed and changes are not allowed, this can be indicated by specifying a single mode per payload type.

尽管UEMCLIP解码器应在任何模式下接受比特流,但实现可能无法适应会话期间的动态模式更改。因此,应用程序可以选择使用一种固定模式或可以动态更改的多种模式进行操作。如果要固定模式且不允许更改,可通过为每个有效负载类型指定单一模式来指示。

The mode numbers that can be specified in a payload type as arguments to "mode" are restricted by a combination of a clock rate and a number of audio channels. This is because SDP binds a payload type to a combination of a sampling frequency and a number of audio channels. Table 4 gives selectable mode numbers that are attributed with clock rates. When mode specifications are not given at all, a payload type MUST default to a single mode using the default value specified in this table.

可在有效负载类型中指定为“mode”参数的模式编号受时钟频率和音频通道数量的组合限制。这是因为SDP将有效负载类型绑定到采样频率和多个音频通道的组合。表4给出了与时钟频率相关的可选模式编号。如果完全没有给出模式规范,则有效负载类型必须使用此表中指定的默认值默认为单一模式。

        +------------+----------+------------------+--------------+
        | Clock rate | Channels | Selectable modes | Default mode |
        +------------+----------+------------------+--------------+
        |       8000 |     1    |        0,3       |       0      |
        |            |          |                  |              |
        |      16000 |     1    |      0,1,3,4     |       1      |
        +------------+----------+------------------+--------------+
        
        +------------+----------+------------------+--------------+
        | Clock rate | Channels | Selectable modes | Default mode |
        +------------+----------+------------------+--------------+
        |       8000 |     1    |        0,3       |       0      |
        |            |          |                  |              |
        |      16000 |     1    |      0,1,3,4     |       1      |
        +------------+----------+------------------+--------------+
        

Table 4: Default Modes

表4:默认模式

It should be noted that a mode attributed with a larger sampling frequency (Fs) is not used in conjunction with smaller clock rates specified in "a=rtpmap". This means that Modes 0 and 3 can be specified in a payload type having a clock rate of both 8000 and 16000 in "a=rtpmap", but Modes 1 and 4 cannot be specified with one having a clock rate of 8000.

应注意,具有较大采样频率(Fs)的模式不与“a=rtpmap”中规定的较小时钟频率一起使用。这意味着在“a=rtpmap”中,模式0和3可以在时钟频率均为8000和16000的有效负载类型中指定,但模式1和4不能在时钟频率为8000的有效负载类型中指定。

6.3. Offer-Answer Model Considerations
6.3. 提供答案模型注意事项
6.3.1. Offer-Answer Guidelines
6.3.1. 提供答案指南

The procedures related to exchanging SDP messages MUST follow [RFC3264]. The following is a detailed list on the semantics of using the UEMCLIP payload format in an offer-answer exchange.

与交换SDP消息相关的过程必须遵循[RFC3264]。下面是关于在要约-应答交换中使用UEMCLIP有效负载格式的语义的详细列表。

o An offerer SHOULD offer every possible combination of UEMCLIP payload type it can handle, i.e., sampling frequency, channel number, and fmtp parameters, in a preferred order. When the transmission bandwidth is restricted, it MUST be offered in accordance to the restriction.

o 报价人应以优先顺序提供其能够处理的UEMCLIP有效载荷类型的所有可能组合,即采样频率、信道号和fmtp参数。当传输带宽受到限制时,必须按照限制提供。

o When multiple UEMCLIP payload types are offered, it is RECOMMENDED that the answerer select a single UEMCLIP payload type and answer it back.

o 当提供多个UEMCLIP有效负载类型时,建议应答者选择一个UEMCLIP有效负载类型并回复。

o In a UEMCLIP payload type, an answerer MUST answer back suitable mode number(s) as a subset of what has been offered. This means that there is a symmetry assumption on sent and received streams, and the offerer MUST NOT send in modes that it does not offer.

o 在UEMCLIP有效载荷类型中,应答者必须回答合适的模式编号,作为所提供内容的子集。这意味着在发送和接收流上存在对称性假设,并且发盘方不得以其不提供的模式发送。

o In an offering/answering SDP, any fmtp parameters that are not known MUST be ignored. If any unknown/undefined parameters should be offered, an answerer MUST delete the entry from the answer message.

o 在提供/应答SDP中,必须忽略任何未知的fmtp参数。如果应提供任何未知/未定义的参数,应答者必须从应答消息中删除条目。

o A receiver of an SDP message MUST only use specified payload types and modes. When a mode specification is missing, i.e., a mode is not specified at all, the session MUST default to one single mode without mode changes during a session. For this case, the default mode values, as shown in Table 4, MUST be used based on the sampling frequency and number of channels. This table must be looked up only when there are no mode specifications; thus, the offerer/answerer MUST NOT assume that the default modes are always available when it is not in the specified list of modes.

o SDP消息的接收器必须仅使用指定的有效负载类型和模式。如果缺少模式规范,即根本没有指定模式,则会话必须默认为一个单一模式,而不会在会话期间更改模式。对于这种情况,必须根据采样频率和通道数使用表4所示的默认模式值。只有在没有模式规范的情况下才能查阅此表;因此,当默认模式不在指定的模式列表中时,报价人/应答人不得假设默认模式始终可用。

o When an offered condition does not fit an answerer's capabilities, it naturally MUST NOT answer any of the conditions, and the session MAY proceed to re-INVITE, if possible. If a condition (mode) is decided upon, an offerer and an answerer MUST transmit on this condition.

o 当提供的条件不符合回答者的能力时,自然不能回答任何条件,如果可能,会话可以继续重新邀请。如果确定了条件(模式),则发盘方和应答方必须在此条件下进行传输。

6.3.2. Examples
6.3.2. 例子

When an offerer indicates that he/she wishes to dynamically switch between modes (0,1,3, and 4) during a session, an example of an offered SDP could be:

当报价人表示他/她希望在会话期间在模式(0、1、3和4)之间动态切换时,所提供SDP的示例可以是:

     v=0
     o=john 51050101 51050101 IN IP4 offhost.example.com
     s=-
     c=IN IP4 offhost.example.com
     t=0 0
     m=audio 5004 RTP/AVP 96
     a=rtpmap:96 UEMCLIP/16000/1
     a=fmtp:96 mode=4,1,3,0
        
     v=0
     o=john 51050101 51050101 IN IP4 offhost.example.com
     s=-
     c=IN IP4 offhost.example.com
     t=0 0
     m=audio 5004 RTP/AVP 96
     a=rtpmap:96 UEMCLIP/16000/1
     a=fmtp:96 mode=4,1,3,0
        

It should be noted that the listed modes appears in the offerer's preference.

需要注意的是,列出的模式是报价人的偏好。

When an answerer can only operate in Modes 1 and 0 but can dynamically switch between those modes during a session, an answerer MUST delete the entries of Mode 3 and 4, and answer back as:

当应答者只能在模式1和0下工作,但可以在会话期间在这些模式之间动态切换时,应答者必须删除模式3和4的条目,并回复为:

     v=0
     o=lena 549947322 549947322 IN IP4 anshost.example.org
     s=-
     c=IN IP4 anshost.example.org
     t=0 0
     m=audio 5004 RTP/AVP 96
     a=rtpmap:96 UEMCLIP/16000/1
     a=fmtp:96 mode=1,0
        
     v=0
     o=lena 549947322 549947322 IN IP4 anshost.example.org
     s=-
     c=IN IP4 anshost.example.org
     t=0 0
     m=audio 5004 RTP/AVP 96
     a=rtpmap:96 UEMCLIP/16000/1
     a=fmtp:96 mode=1,0
        

As a result, both would start communicating in either Mode 1 or 0, and can dynamically switch between those modes during the session.

因此,两者都将以模式1或0开始通信,并且可以在会话期间在这些模式之间动态切换。

On the other hand, when the answerer is capable of communicating either in Modes 1 or 0, and cannot switch between modes during a session, an example of such answer is as follows:

另一方面,当应答者能够以模式1或0进行通信,并且不能在会话期间在模式之间切换时,此类应答的示例如下:

     v=0
     o=lena 549947322 549947322 IN IP4 anshost.example.org
     s=-
     c=IN IP4 anshost.example.org
     t=0 0
     m=audio 5004 RTP/AVP 96
     a=rtpmap:96 UEMCLIP/16000/1
     a=fmtp:96 mode=1
        
     v=0
     o=lena 549947322 549947322 IN IP4 anshost.example.org
     s=-
     c=IN IP4 anshost.example.org
     t=0 0
     m=audio 5004 RTP/AVP 96
     a=rtpmap:96 UEMCLIP/16000/1
     a=fmtp:96 mode=1
        

As a result, both will start communicating in Mode 1. It should be noted that mode change during this session is not allowed because the answerer responded with a single mode, and answerer selected Mode 1 above Mode 0 according to the offered order.

因此,两者都将以模式1开始通信。需要注意的是,不允许在此会话期间更改模式,因为应答者以单一模式响应,并且应答者根据提供的订单选择模式0上方的模式1。

If an offerer does not want a mode change during a session but is capable of receiving either Modes 4 or 1 bitstreams, the SDP should somewhat look like:

如果发盘方不希望在会话期间改变模式,但能够接收模式4或1比特流,则SDP应类似于:

     v=0
     o=john 51050101 51050101 IN IP4 offhost.example.com
     s=-
     c=IN IP4 offhost.example.com
     t=0 0
     m=audio 5004 RTP/AVP 96 97
     a=rtpmap:96 UEMCLIP/16000/1
     a=fmtp:96 mode=4
     a=rtpmap:97 UEMCLIP/16000/1
     a=fmtp:97 mode=1
        
     v=0
     o=john 51050101 51050101 IN IP4 offhost.example.com
     s=-
     c=IN IP4 offhost.example.com
     t=0 0
     m=audio 5004 RTP/AVP 96 97
     a=rtpmap:96 UEMCLIP/16000/1
     a=fmtp:96 mode=4
     a=rtpmap:97 UEMCLIP/16000/1
     a=fmtp:97 mode=1
        

and if the answerer prefers to communicate in Mode 1, an answer would be:

如果回答者更喜欢以模式1进行交流,答案是:

     v=0
     o=lena 549947322 549947322 IN IP4 anshost.example.org
     s=-
     c=IN IP4 anshost.example.org
     t=0 0
     m=audio 5004 RTP/AVP 97
     a=rtpmap:97 UEMCLIP/16000/1
     a=fmtp:97 mode=1
        
     v=0
     o=lena 549947322 549947322 IN IP4 anshost.example.org
     s=-
     c=IN IP4 anshost.example.org
     t=0 0
     m=audio 5004 RTP/AVP 97
     a=rtpmap:97 UEMCLIP/16000/1
     a=fmtp:97 mode=1
        

Please note that it is RECOMMENDED to select a single UEMCLIP payload type for answers.

请注意,建议为答案选择单一UEMCLIP有效负载类型。

The "ptime" attribute is used to denote the desired packetization interval. When not specified, it SHOULD default to 20. Since UEMCLIP uses 20-ms frames, ptime values of multiples of 20 imply multiple frames per packet. In the example below, the ptime is set to 60, and this means that offerer wants to receive 3 frames in each packet.

“ptime”属性用于表示所需的打包间隔。未指定时,它应默认为20。由于UEMCLIP使用20毫秒帧,因此20的倍数的ptime值意味着每个数据包有多个帧。在下面的示例中,ptime设置为60,这意味着报价人希望在每个数据包中接收3帧。

     v=0
     o=kosuke 2890844730 2890844730 IN IP4 anotherhost.example.com
     s=-
     c=IN IP4 anotherhost.example.com
     t=0 0
     m=audio 5004 RTP/AVP 96
     a=ptime:60
     a=rtpmap:96 UEMCLIP/16000/1
        
     v=0
     o=kosuke 2890844730 2890844730 IN IP4 anotherhost.example.com
     s=-
     c=IN IP4 anotherhost.example.com
     t=0 0
     m=audio 5004 RTP/AVP 96
     a=ptime:60
     a=rtpmap:96 UEMCLIP/16000/1
        

When mode specification is not present, it should default to a fixed mode, and in this case, Mode 1 (see Section 6.2.1).

当模式规范不存在时,应默认为固定模式,在这种情况下为模式1(见第6.2.1节)。

7. Security Considerations
7. 安全考虑

RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [RFC3550] and any appropriate profiles. This implies that confidentiality of the media streams is achieved by encryption unless the applicable profile specifies other means.

使用本规范中定义的有效负载格式的RTP数据包受RTP规范[RFC3550]和任何适当配置文件中讨论的安全注意事项的约束。这意味着通过加密实现媒体流的机密性,除非适用的配置文件指定了其他方式。

A potential denial-of-service threat exists for data encoding using compression techniques that have non-uniform receiver-end computational load. The attacker can inject pathological datagrams into the stream that are complex to decode and cause the receiver output to become overloaded. However, the UEMCLIP covered in this document do not exhibit any significant non-uniformity.

使用具有非均匀接收端计算负载的压缩技术进行数据编码存在潜在的拒绝服务威胁。攻击者可以向流中注入难以解码的病理数据报,并导致接收器输出过载。然而,本文件中涵盖的UEMCLIP并未表现出任何显著的不一致性。

Another potential threat is memory attacks by illegal layer indices or byte numbers. The implementor of the decoder should always be aware that the indicated numbers may be corrupted and not point to the right sub-layer, and they may force reading beyond the bitstream boundaries. It is advised that a decoder implementation reject layers of such indices.

另一个潜在威胁是通过非法层索引或字节数进行内存攻击。解码器的实现者应始终意识到所指示的数字可能被破坏并且不指向正确的子层,并且它们可能强制读取比特流边界之外的内容。建议解码器实现拒绝此类索引的层。

8. IANA Considerations
8. IANA考虑

One new media subtype (audio/UEMCLIP) has been registered by IANA. For details, see Section 6.1.

IANA已经注册了一个新的媒体子类型(音频/UEMCLIP)。有关详细信息,请参见第6.1节。

9. References
9. 工具书类
9.1. Normative References
9.1. 规范性引用文件

[ITU-T-G.711] International Telecommunications Union, "Pulse code modulation (PCM) of voice frequencies", ITU-T Recommendation G.711, November 1988.

[ITU-T-G.711]国际电信联盟,“语音频率的脉冲编码调制(PCM)”,ITU-T建议G.711,1988年11月。

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。

[RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002.

[RFC3264]Rosenberg,J.和H.Schulzrinne,“具有会话描述协议(SDP)的提供/应答模型”,RFC 3264,2002年6月。

[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003.

[RFC3550]Schulzrinne,H.,Casner,S.,Frederick,R.,和V.Jacobson,“RTP:实时应用的传输协议”,STD 64,RFC 35502003年7月。

[RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video Conferences with Minimal Control", STD 65, RFC 3551, July 2003.

[RFC3551]Schulzrinne,H.和S.Casner,“具有最小控制的音频和视频会议的RTP配置文件”,STD 65,RFC 3551,2003年7月。

[RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and Registration Procedures", BCP 13, RFC 4288, December 2005.

[RFC4288]Freed,N.和J.Klensin,“介质类型规范和注册程序”,BCP 13,RFC 4288,2005年12月。

[RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session Description Protocol", RFC 4566, July 2006.

[RFC4566]Handley,M.,Jacobson,V.,和C.Perkins,“SDP:会话描述协议”,RFC4566,2006年7月。

[RFC4855] Casner, S., "Media Type Registration of RTP Payload Formats", RFC 4855, February 2007.

[RFC4855]Casner,S.,“RTP有效负载格式的媒体类型注册”,RFC 48552007年2月。

[RFC4856] Casner, S., "Media Type Registration of Payload Formats in the RTP Profile for Audio and Video Conferences", RFC 4856, February 2007.

[RFC4856]Casner,S.,“音频和视频会议RTP配置文件中有效负载格式的媒体类型注册”,RFC 4856,2007年2月。

[RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, January 2008.

[RFC5117]Westerlund,M.和S.Wenger,“RTP拓扑”,RFC 51172008年1月。

9.2. Informative References
9.2. 资料性引用

[ITU-T-G.711Appendix1] International Telecommunications Union, "Pulse code modulation (PCM) of voice frequencies, Appendix I: A high quality low-complexity algorithm for packet loss concealment with G.711", ITU-T Recommendation G.711 Appendix I, September 1999.

[ITU-T-G.711附录1]国际电信联盟,“语音频率的脉冲编码调制(PCM),附录I:采用G.711的高质量低复杂度包丢失隐藏算法”,ITU-T建议G.711附录I,1999年9月。

Authors' Addresses

作者地址

Yusuke Hiwasaki NTT Corporation 3-9-11 Midori-cho, Musashino-shi Tokyo 180-8585 Japan

Yusuke Hiwasaki NTT Corporation 3-9-11 Midori cho,武藏县东京180-8585

   Phone: +81(422)59-4815
   EMail: hiwasaki.yusuke@lab.ntt.co.jp
        
   Phone: +81(422)59-4815
   EMail: hiwasaki.yusuke@lab.ntt.co.jp
        

Hitoshi Ohmuro NTT Corporation 3-9-11 Midori-cho, Musashino-shi Tokyo 180-8585 Japan

Hitoshi Ohmuro NTT Corporation 3-9-11 Midori cho,武藏县东京180-8585

   Phone: +81(422)59-2151
   EMail: ohmuro.hitoshi@lab.ntt.co.jp
        
   Phone: +81(422)59-2151
   EMail: ohmuro.hitoshi@lab.ntt.co.jp