Network Working Group                                         J. Sjoberg
Request for Comments: 4352                                 M. Westerlund
Category: Standards Track                                       Ericsson
                                                            A. Lakaniemi
                                                               S. Wenger
                                                                   Nokia
                                                            January 2006
        
Network Working Group                                         J. Sjoberg
Request for Comments: 4352                                 M. Westerlund
Category: Standards Track                                       Ericsson
                                                            A. Lakaniemi
                                                               S. Wenger
                                                                   Nokia
                                                            January 2006
        

RTP Payload Format for the Extended Adaptive Multi-Rate Wideband (AMR-WB+) Audio Codec

扩展自适应多速率宽带(AMR-WB+)音频编解码器的RTP有效负载格式

Status of This Memo

关于下段备忘

This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.

本文件规定了互联网社区的互联网标准跟踪协议,并要求进行讨论和提出改进建议。有关本协议的标准化状态和状态,请参考当前版本的“互联网官方协议标准”(STD 1)。本备忘录的分发不受限制。

Copyright Notice

版权公告

Copyright (C) The Internet Society (2006).

版权所有(C)互联网协会(2006年)。

Abstract

摘要

This document specifies a Real-time Transport Protocol (RTP) payload format for Extended Adaptive Multi-Rate Wideband (AMR-WB+) encoded audio signals. The AMR-WB+ codec is an audio extension of the AMR-WB speech codec. It encompasses the AMR-WB frame types and a number of new frame types designed to support high-quality music and speech. A media type registration for AMR-WB+ is included in this specification.

本文件规定了扩展自适应多速率宽带(AMR-WB+)编码音频信号的实时传输协议(RTP)有效载荷格式。AMR-WB+编解码器是AMR-WB语音编解码器的音频扩展。它包括AMR-WB帧类型和许多新的帧类型,旨在支持高质量的音乐和语音。本规范包含AMR-WB+的媒体类型注册。

Table of Contents

目录

   1. Introduction ....................................................3
   2. Definitions .....................................................4
      2.1. Glossary ...................................................4
      2.2. Terminology ................................................4
   3. Background of AMR-WB+ and Design Principles .....................4
      3.1. The AMR-WB+ Audio Codec ....................................4
      3.2. Multi-rate Encoding and Rate Adaptation ....................8
      3.3. Voice Activity Detection and Discontinuous Transmission ....8
      3.4. Support for Multi-Channel Session ..........................8
      3.5. Unequal Bit-Error Detection and Protection .................9
      3.6. Robustness against Packet Loss .............................9
           3.6.1. Use of Forward Error Correction (FEC) ...............9
           3.6.2. Use of Frame Interleaving ..........................10
      3.7. AMR-WB+ Audio over IP Scenarios ...........................11
      3.8. Out-of-Band Signaling .....................................11
   4. RTP Payload Format for AMR-WB+ .................................12
      4.1. RTP Header Usage ..........................................13
      4.2. Payload Structure .........................................14
      4.3. Payload Definitions .......................................14
           4.3.1. Payload Header .....................................14
           4.3.2. The Payload Table of Contents ......................15
           4.3.3. Audio Data .........................................20
           4.3.4. Methods for Forming the Payload ....................21
           4.3.5. Payload Examples ...................................21
      4.4. Interleaving Considerations ...............................24
      4.5. Implementation Considerations .............................25
           4.5.1. ISF Recovery in Case of Packet Loss ................26
           4.5.2. Decoding Validation ................................28
   5. Congestion Control .............................................28
   6. Security Considerations ........................................28
      6.1. Confidentiality ...........................................29
      6.2. Authentication and Integrity ..............................29
   7. Payload Format Parameters ......................................29
      7.1. Media Type Registration ...................................30
      7.2. Mapping Media Type Parameters into SDP ....................32
           7.2.1. Offer-Answer Model Considerations ..................32
           7.2.2. Examples ...........................................34
   8. IANA Considerations ............................................34
   9. Contributors ...................................................34
   10. Acknowledgements ..............................................34
   11. References ....................................................35
      11.1. Normative References .....................................35
      11.2. Informative References ...................................35
        
   1. Introduction ....................................................3
   2. Definitions .....................................................4
      2.1. Glossary ...................................................4
      2.2. Terminology ................................................4
   3. Background of AMR-WB+ and Design Principles .....................4
      3.1. The AMR-WB+ Audio Codec ....................................4
      3.2. Multi-rate Encoding and Rate Adaptation ....................8
      3.3. Voice Activity Detection and Discontinuous Transmission ....8
      3.4. Support for Multi-Channel Session ..........................8
      3.5. Unequal Bit-Error Detection and Protection .................9
      3.6. Robustness against Packet Loss .............................9
           3.6.1. Use of Forward Error Correction (FEC) ...............9
           3.6.2. Use of Frame Interleaving ..........................10
      3.7. AMR-WB+ Audio over IP Scenarios ...........................11
      3.8. Out-of-Band Signaling .....................................11
   4. RTP Payload Format for AMR-WB+ .................................12
      4.1. RTP Header Usage ..........................................13
      4.2. Payload Structure .........................................14
      4.3. Payload Definitions .......................................14
           4.3.1. Payload Header .....................................14
           4.3.2. The Payload Table of Contents ......................15
           4.3.3. Audio Data .........................................20
           4.3.4. Methods for Forming the Payload ....................21
           4.3.5. Payload Examples ...................................21
      4.4. Interleaving Considerations ...............................24
      4.5. Implementation Considerations .............................25
           4.5.1. ISF Recovery in Case of Packet Loss ................26
           4.5.2. Decoding Validation ................................28
   5. Congestion Control .............................................28
   6. Security Considerations ........................................28
      6.1. Confidentiality ...........................................29
      6.2. Authentication and Integrity ..............................29
   7. Payload Format Parameters ......................................29
      7.1. Media Type Registration ...................................30
      7.2. Mapping Media Type Parameters into SDP ....................32
           7.2.1. Offer-Answer Model Considerations ..................32
           7.2.2. Examples ...........................................34
   8. IANA Considerations ............................................34
   9. Contributors ...................................................34
   10. Acknowledgements ..............................................34
   11. References ....................................................35
      11.1. Normative References .....................................35
      11.2. Informative References ...................................35
        
1. Introduction
1. 介绍

This document specifies the payload format for packetization of Extended Adaptive Multi-Rate Wideband (AMR-WB+) [1] encoded audio signals into the Real-time Transport Protocol (RTP) [3]. The payload format supports the transmission of mono or stereo audio, aggregating multiple frames per payload, and mechanisms enhancing the robustness of the packet stream against packet loss.

本文件规定了将扩展自适应多速率宽带(AMR-WB+)[1]编码音频信号打包到实时传输协议(RTP)[3]中的有效载荷格式。有效载荷格式支持单声道或立体声音频的传输,每个有效载荷聚合多个帧,以及增强分组流对分组丢失的鲁棒性的机制。

The AMR-WB+ codec is an extension of the Adaptive Multi-Rate Wideband (AMR-WB) speech codec. New features include extended audio bandwidth to enable high quality for non-speech signals (e.g., music), native support for stereophonic audio, and the option to operate on, and switch between, several internal sampling frequencies (ISFs). The primary usage scenario for AMR-WB+ is the transport over IP. Therefore, interworking with other transport networks, as discussed for AMR-WB in [7], is not a major concern and hence not addressed in this memo.

AMR-WB+编解码器是自适应多速率宽带(AMR-WB)语音编解码器的扩展。新功能包括扩展音频带宽以实现非语音信号(如音乐)的高质量,立体声音频的本机支持,以及在多个内部采样频率(ISF)上操作和切换的选项。AMR-WB+的主要使用场景是IP传输。因此,如[7]中针对AMR-WB所讨论的,与其他传输网络的互通不是一个主要问题,因此本备忘录中未提及。

The expected key application for AMR-WB+ is streaming. To make the packetization process on a streaming server as efficient as possible, an octet-aligned payload format is desirable. Therefore, a bandwidth-efficient mode (as defined for AMR-WB in [7]) is not specified herein; the bandwidth savings of the bandwidth-efficient mode would be very small anyway, since all extension frame types are octet aligned.

AMR-WB+的预期关键应用是流媒体。为了使流式服务器上的打包过程尽可能高效,需要八位字节对齐的有效负载格式。因此,本文未规定带宽效率模式(如[7]中针对AMR-WB定义的);由于所有扩展帧类型都是八位组对齐的,因此带宽有效模式的带宽节省无论如何都是非常小的。

The stereo encoding capability of AMR-WB+ renders the support for multi-channel transport at RTP payload format level, as specified for AMR-WB [7], obsolete. Therefore, this feature is not included in this memo.

AMR-WB+的立体声编码能力使得在RTP有效负载格式级别对多信道传输的支持(如AMR-WB[7]所述)过时。因此,此功能不包括在本备忘录中。

This specification does not include a definition of a file format for AMR-WB+. Instead, it refers to the ISO-based 3GP file format [14], which supports AMR-WB+ and provides all functionality required. The 3GP format also supports storage of AMR, AMR-WB, and many other multi-media formats, thereby allowing synchronized playback.

本规范不包括AMR-WB+文件格式的定义。相反,它指的是基于ISO的3GP文件格式[14],它支持AMR-WB+并提供所需的所有功能。3GP格式还支持存储AMR、AMR-WB和许多其他多媒体格式,从而允许同步播放。

The rest of the document is organized as follows: Background information on the AMR-WB+ codec, and design principles, can be found in Section 3. The payload format itself is specified in Section 4. Sections 5 and 6 discuss congestion control and security considerations, respectively. In Section 7, a media type registration is provided.

本文档的其余部分组织如下:AMR-WB+编解码器的背景信息和设计原则见第3节。第4节规定了有效载荷格式本身。第5节和第6节分别讨论拥塞控制和安全注意事项。在第7节中,提供了媒体类型注册。

2. Definitions
2. 定义
2.1. Glossary
2.1. 术语汇编

3GPP - Third Generation Partnership Project AMR - Adaptive Multi-Rate (Codec) AMR-WB - Adaptive Multi-Rate Wideband (Codec) AMR-WB+ - Extended Adaptive Multi-Rate Wideband (Codec) CN - Comfort Noise DTX - Discontinuous Transmission FEC - Forward Error Correction FT - Frame Type ISF - Internal Sampling Frequency SCR - Source-Controlled Rate Operation SID - Silence Indicator (the frames containing only CN parameters) TFI - Transport Frame Index TS - Timestamp VAD - Voice Activity Detection UED - Unequal Error Detection UEP - Unequal Error Protection

3GPP-第三代合作伙伴项目AMR-自适应多速率(编解码器)AMR-WB-自适应多速率宽带(编解码器)AMR-WB+-扩展自适应多速率宽带(编解码器)CN-舒适噪声DTX-不连续传输FEC-前向纠错FT-帧类型ISF-内部采样频率SCR-源控速率操作SID-静音指示器(帧仅包含CN参数)TFI-传输帧索引TS-时间戳VAD-语音活动检测UED-不等错误检测UEP-不等错误保护

2.2. Terminology
2.2. 术语

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [2].

本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照RFC 2119[2]中所述进行解释。

3. Background of AMR-WB+ and Design Principles
3. AMR-WB+的背景和设计原则

The Extended Adaptive Multi-Rate Wideband (AMR-WB+) [1] audio codec is designed to compress speech and audio signals at low bit-rate and good quality. The codec is specified by the Third Generation Partnership Project (3GPP). The primary target applications are 1) the packet-switched streaming service (PSS) [13], 2) multimedia messaging service (MMS) [18], and 3) multimedia broadcast and multicast service (MBMS) [19]. However, due to its flexibility and robustness, AMR-WB+ is also well suited for streaming services in other highly varying transport environments, for example, the Internet.

扩展自适应多速率宽带(AMR-WB+)[1]音频编解码器设计用于以低比特率和高质量压缩语音和音频信号。编解码器由第三代合作伙伴计划(3GPP)指定。主要目标应用是1)分组交换流媒体服务(PSS)[13],2)多媒体消息服务(MMS)[18],以及3)多媒体广播和多播服务(MBMS)[19]。然而,由于其灵活性和健壮性,AMR-WB+也非常适合于其他高度变化的传输环境中的流式服务,例如互联网。

3.1. The AMR-WB+ Audio Codec
3.1. AMR-WB+音频编解码器

3GPP originally developed the AMR-WB+ audio codec for streaming and messaging services in Global System for Mobile communications (GSM) and third generation (3G) cellular systems. The codec is designed as an audio extension of the AMR-WB speech codec. The extension adds new functionality to the codec in order to provide high audio quality

3GPP最初为全球移动通信系统(GSM)和第三代(3G)蜂窝系统中的流媒体和消息服务开发了AMR-WB+音频编解码器。该编解码器设计为AMR-WB语音编解码器的音频扩展。该扩展为编解码器添加了新功能,以提供高音频质量

for a wide range of signals including music. Stereophonic operation has also been added. A new, high-efficiency hybrid stereo coding algorithm enables stereo operation at bit-rates as low as 6.2 kbit/s.

适用于包括音乐在内的各种信号。还增加了立体声操作。一种新的高效混合立体声编码算法使立体声操作的比特率低至6.2kbit/s。

The AMR-WB+ codec includes the nine frame types specified for AMR-WB, extended by new bit-rates ranging from 5.2 to 48 kbit/s. The AMR-WB frame types can employ only a 16000 Hz sampling frequency and operate only on monophonic signals. The newly introduced extension frame types, however, can operate at a number of internal sampling frequencies (ISFs), both in mono and stereo. Please see Table 24 in [1] for details. The output sampling frequency of the decoder is limited to 8, 16, 24, 32, or 48 kHz.

AMR-WB+编解码器包括为AMR-WB指定的九种帧类型,通过5.2到48 kbit/s的新比特率进行扩展。AMR-WB帧类型只能使用16000 Hz的采样频率,并且只能对单声道信号进行操作。然而,新引入的扩展帧类型可以在多个内部采样频率(ISF)下工作,包括单声道和立体声。详见[1]表24。解码器的输出采样频率限制为8、16、24、32或48 kHz。

An overview of the AMR-WB+ encoding operations is provided as follows. The encoder receives the audio sampled at, for example, 48 kHz. The encoding process starts with pre-processing and resampling to the user-selected ISF. The encoding is performed on equally sized super-frames. Each super-frame corresponds to 2048 samples per channel, at the ISF. The codec carries out a number of encoding decisions for each super-frame, thereby choosing between different encoding algorithms and block lengths, so as to achieve a fidelity-optimized encoding adapted to the signal characteristics of the source. The stereo encoding (if used) executes separately from the monophonic core encoding, thus enabling the selection of different combinations of core and stereo encoding rates. The resulting encoded audio is produced in four transport frames of equal length. Each transport frame corresponds to 512 samples at the ISF and is individually usable by the decoder, provided that its position in the super-frame structure is known.

下文概述了AMR-WB+编码操作。编码器接收以例如48 kHz的频率采样的音频。编码过程首先对用户选择的ISF进行预处理和重采样。编码在大小相同的超级帧上执行。在ISF处,每个超级帧对应于每个通道2048个采样。编解码器对每个超帧执行多个编码决策,从而在不同的编码算法和块长度之间进行选择,从而实现适合于源信号特征的保真度优化编码。立体声编码(如果使用)与单声道核心编码分开执行,因此能够选择核心和立体声编码速率的不同组合。产生的编码音频在四个长度相等的传输帧中产生。每个传输帧对应于ISF处的512个样本,并且只要其在超级帧结构中的位置是已知的,解码器就可以单独使用。

The codec supports 13 different ISFs, ranging from 12.8 to 38.4 kHz, as described by Table 24 of [1]. The high number of ISFs allows a trade-off between the audio bandwidth and the target bit-rate. As encoding is performed on 2048 samples at the ISF, the duration of a super-frame and the effective bit-rate of the frame type in use varies.

编解码器支持13种不同的ISF,范围从12.8到38.4 kHz,如[1]的表24所述。高ISF数允许在音频带宽和目标比特率之间进行权衡。当在ISF对2048个样本执行编码时,超级帧的持续时间和正在使用的帧类型的有效比特率会发生变化。

The ISF of 25600 Hz has a super-frame duration of 80 ms. This is the 'nominal' value used to describe the encoding bit-rates henceforth. Assuming this normalization, the ISF selection results in bit-rate variations from 1/2 up to 3/2 of the nominal bit-rate.

25600 Hz的ISF具有80 ms的超帧持续时间。这是用于描述此后编码比特率的“标称”值。假设这种标准化,ISF选择会导致比特率从标称比特率的1/2变化到3/2。

The encoding for the extension modes is performed as one monophonic core encoding and one stereo encoding. The core encoding is executed by splitting the monophonic signal into a lower and a higher frequency band. The lower band is encoded employing either algebraic code excited linear prediction (ACELP) or transform coded excitation (TCX). This selection can be made once per transport frame, but must

扩展模式的编码作为一个单声道核心编码和一个立体声编码执行。核心编码通过将单声道信号分为较低和较高的频带来执行。较低频带采用代数编码激励线性预测(ACELP)或变换编码激励(TCX)进行编码。每个传输帧可以进行一次选择,但必须

obey certain limitations of legal combinations within the super-frame. The higher band is encoded using a low-rate parametric bandwidth extension approach.

遵守超级框架内法律组合的某些限制。使用低速率参数带宽扩展方法对较高频带进行编码。

The stereo signal is encoded employing a similar frequency band decomposition; however, here the signal is divided into three bands that are individually parameterized.

采用类似的频带分解对立体声信号进行编码;然而,在这里,信号被分为三个单独参数化的频带。

The total bit-rate produced by the extension is the result of the combination of the encoder's core rate, stereo rate, and ISF. The extension supports 8 different core encoding rates, producing bit-rates between 10.4 and 24.0 kbit/s; see Table 22 in [1]. There are 16 stereo encoding rates generating bit-rates between 2.0 and 8.0 kbit/s; see Table 23 in [1]. The frame type uniquely identifies the AMR-WB modes, 4 fixed extension rates (see below), 24 combinations of core and stereo rates for stereo signals, and the 8 core rates for mono signals, as listed in Table 25 in [1]. This implies that the AMR-WB+ supports encoding rates between 10.4 and 32 kbit/s, assuming an ISF of 25600 Hz.

扩展产生的总比特率是编码器核心速率、立体声速率和ISF组合的结果。该扩展支持8种不同的核心编码速率,产生的比特率介于10.4和24.0 kbit/s之间;见[1]中的表22。有16个立体声编码速率,产生2.0和8.0 kbit/s之间的比特率;见[1]中的表23。帧类型唯一标识了AMR-WB模式、4个固定扩展速率(见下文)、立体声信号的24个核心和立体声速率组合以及单声道信号的8个核心速率,如[1]中的表25所示。这意味着AMR-WB+支持10.4和32 kbit/s之间的编码速率,假设ISF为25600 Hz。

Different ISFs allow for additional freedom in the produced bit-rates and audio quality. The selection of an ISF changes the available audio bandwidth of the reconstructed signal, and also the total bit-rate. The bit-rate for a given combination of frame type and ISF is determined by multiplying the frame type's bit-rate with the used ISF's bit-rate factor; see Table 24 in [1].

不同的ISF允许在产生的比特率和音频质量方面有额外的自由度。ISF的选择会改变重构信号的可用音频带宽以及总比特率。帧类型和ISF的给定组合的比特率通过将帧类型的比特率乘以使用的ISF比特率因子来确定;见[1]中的表24。

The extension also has four frame types which have fixed ISFs. Please see frame types 10-13 in Table 21 in [1]. These four pre-defined frame types have a fixed input sampling frequency at the encoder, which can be set at either 16 or 24 kHz. Like the AMR-WB frame types, transport frames encoded utilizing these frame types represent exactly 20 ms of the audio signal. However, they are also part of 80 ms super-frames. Frame types 0-13 (AMR-WB and fixed extension rates), as listed in Table 21 in [1], do not require an explicit ISF indication. The other frame types, 14-47, require the ISF employed to be indicated.

扩展还有四种具有固定ISF的帧类型。请参见[1]表21中的框架类型10-13。这四种预定义的帧类型在编码器处具有固定的输入采样频率,可以设置为16或24 kHz。与AMR-WB帧类型一样,利用这些帧类型编码的传输帧正好代表20毫秒的音频信号。然而,它们也是80ms超级帧的一部分。[1]表21中列出的帧类型0-13(AMR-WB和固定扩展速率)不需要明确的ISF指示。其他框架类型(14-47)要求指示使用的ISF。

The 32 different frame types of the extension, in combination with 13 ISFs, allows for a great flexibility in bit-rate and selection of desired audio quality. A number of combinations exist that produce the same codec bit-rate. For example, a 32 kbit/s audio stream can be produced by utilizing frame type 41 (i.e., 25.6 kbit/s) and the ISF of 32kHz (5/4 * (19.2+6.4) = 32 kbit/s), or frame type 47 and the ISF of 25.6 kHz (1 * (24 + 8) = 32 kbit/s). Which combination is more beneficial for the perceived audio quality depends on the content. In the above example, the first case provides a higher audio bandwidth, while the second one spends the same number of bits

32种不同的扩展帧类型,加上13个ISF,允许在比特率和所需音频质量的选择上具有极大的灵活性。存在许多产生相同编解码器比特率的组合。例如,可以通过利用帧类型41(即,25.6kbit/s)和32kHz(5/4*(19.2+6.4)=32kbit/s)的ISF,或帧类型47和25.6kHz(1*(24+8)=32kbit/s)的ISF来产生32kbit/s音频流。哪种组合更有利于感知音频质量取决于内容。在上面的示例中,第一种情况提供更高的音频带宽,而第二种情况花费相同的比特数

on somewhat narrower audio bandwidth but provides higher fidelity. Encoders are free to select the combination they deem most beneficial.

音频带宽略窄,但保真度更高。编码器可以自由选择他们认为最有益的组合。

Since a transport frame always corresponds to 512 samples at the used ISF, its duration is limited to the range 13.33 to 40 ms; see Table 1. An RTP Timestamp clock rate of 72000 Hz, as mandated by this specification, results in AMR-WB+ transport frame lengths of 960 to 2880 timestamp ticks, depending solely on the selected ISF.

由于传输帧始终对应于所用ISF处的512个样本,其持续时间限制在13.33至40 ms的范围内;见表1。本规范规定的72000 Hz RTP时间戳时钟频率会导致AMR-WB+传输帧长度为960到2880个时间戳刻度,这完全取决于所选的ISF。

      Index   ISF   Duration(ms) Duration(TS Ticks @ 72 kHz)
      ------------------------------------------------------
        0     N/A      20             1440
        1    12800     40             2880
        2    14400     35.55          2560
        3    16000     32             2304
        4    17067     30             2160
        5    19200     26.67          1920
        6    21333     24             1728
        7    24000     21.33          1536
        8    25600     20             1440
        9    28800     17.78          1280
       10    32000     16             1152
       11    34133     15             1080
       12    36000     14.22          1024
       13    38400     13.33           960
        
      Index   ISF   Duration(ms) Duration(TS Ticks @ 72 kHz)
      ------------------------------------------------------
        0     N/A      20             1440
        1    12800     40             2880
        2    14400     35.55          2560
        3    16000     32             2304
        4    17067     30             2160
        5    19200     26.67          1920
        6    21333     24             1728
        7    24000     21.33          1536
        8    25600     20             1440
        9    28800     17.78          1280
       10    32000     16             1152
       11    34133     15             1080
       12    36000     14.22          1024
       13    38400     13.33           960
        

Table 1: Normative number of RTP Timestamp Ticks for each Transport Frame depending on ISF (ISF and Duration in ms are rounded)

表1:每个传输帧的RTP时间戳刻度的标准数量取决于ISF(ISF和持续时间以毫秒为单位四舍五入)

The encoder is free to change both the ISF and the encoding frame type (both mono and stereo) during a session. For the extension frame types with index 10-13 and 16-47, the ISF and frame type changes are constrained to occur at super-frame boundaries. This implies that, for the frame types mentioned, the ISF is constant throughout a super-frame. This limitation does not apply for frame types with index 0-9, 14, and 15; i.e., the original AMR-WB frame types.

编码器可以在会话期间自由更改ISF和编码帧类型(单声道和立体声)。对于索引为10-13和16-47的扩展框架类型,ISF和框架类型的更改被限制在超级框架边界处。这意味着,对于上述帧类型,ISF在整个超级帧中是恒定的。此限制不适用于索引为0-9、14和15的框架类型;i、 例如,原始AMR-WB帧类型。

A number of features of the AMR-WB+ codec require special consideration from a transport point of view, and solutions that could perhaps be viewed as unorthodox. First, there are constraints on the RTP timestamping, due to the relationship of the frame duration and the ISFs. Second, each frame of encoded audio must maintain information about its frame type, ISF, and position in the super-frame.

AMR-WB+编解码器的许多功能需要从传输角度进行特别考虑,并且可能被视为非正统的解决方案。首先,由于帧持续时间和ISF之间的关系,RTP时间戳受到限制。第二,编码音频的每一帧都必须保留有关其帧类型、ISF和在超级帧中的位置的信息。

3.2. Multi-rate Encoding and Rate Adaptation
3.2. 多速率编码与速率自适应

The multi-rate encoding capability of AMR-WB+ is designed to preserve high audio quality under a wide range of bandwidth requirements and transmission conditions.

AMR-WB+的多速率编码功能旨在在各种带宽要求和传输条件下保持高音频质量。

AMR-WB+ enables seamless switching between frame types that use the same number of audio channels and the same ISF. Every AMR-WB+ codec implementation is required to support all frame types defined by the codec and must be able to handle switching between any two frame types. Switching between frame types employing a different number of audio channels or a different ISF must also be supported, but it may not be completely seamless. Therefore, it is recommended to perform such switching infrequently and, if possible, during periods of silence.

AMR-WB+支持在使用相同数量音频通道和相同ISF的帧类型之间无缝切换。每个AMR-WB+编解码器实现都需要支持编解码器定义的所有帧类型,并且必须能够处理任意两种帧类型之间的切换。还必须支持使用不同数量音频通道或不同ISF的帧类型之间的切换,但这可能不是完全无缝的。因此,建议不经常进行此类切换,如果可能,在静默期间进行切换。

3.3. Voice Activity Detection and Discontinuous Transmission
3.3. 语音活动检测与非连续传输

AMR-WB+ supports the same algorithms as AMR-WB for voice activity detection (VAD) and generation of comfort noise (CN) parameters during silence periods. However, these functionalities can only be used in conjunction with the AMR-WB frame types (FT=0-8). This option allows reducing the number of transmitted bits and packets during silence periods to a minimum. The operation of sending CN parameters at regular intervals during silence periods is usually called discontinuous transmission (DTX) or source controlled rate (SCR) operation. The AMR-WB+ frames containing CN parameters are called Silence Indicator (SID) frames. More details about the VAD and DTX functionality are provided in [4] and [5].

AMR-WB+支持与AMR-WB相同的算法,用于静音期间语音活动检测(VAD)和舒适噪音(CN)参数的生成。但是,这些功能只能与AMR-WB帧类型(FT=0-8)结合使用。此选项允许将静默期间传输的位和数据包的数量降至最低。在静默期内定期发送CN参数的操作通常称为不连续传输(DTX)或源控速率(SCR)操作。包含CN参数的AMR-WB+帧称为静默指示器(SID)帧。[4]和[5]中提供了有关VAD和DTX功能的更多详细信息。

3.4. Support for Multi-Channel Session
3.4. 支持多通道会话

Some of the AMR-WB+ frame types support the encoding of stereophonic audio. Because of this native support for a two-channel stereophonic signal, it does not seem necessary to support multi-channel transport with separate codec instances, as specified in the AMR-WB RTP payload [7]. The codec has the capability of stereo to mono downmixing as part of the decoding process. Thus, a receiver that is only capable of playout of monophonic audio must still be able to decode and play signals originally encoded and transmitted as stereo. However, to avoid spending bits on a stereo encoding that is not going to be utilized, a mechanism is defined in this specification to signal mono-only audio.

一些AMR-WB+帧类型支持立体声音频编码。由于对双通道立体声信号的本机支持,似乎没有必要像AMR-WB RTP有效载荷[7]中规定的那样,使用单独的编解码器实例支持多通道传输。作为解码过程的一部分,编解码器具有立体声到单声道下混音的能力。因此,仅能够播放单声道音频的接收器必须仍然能够解码和播放最初编码并作为立体声传输的信号。然而,为了避免在不使用的立体声编码上花费比特,在本规范中定义了一种机制,用于仅向单声道音频发送信号。

3.5. Unequal Bit-Error Detection and Protection
3.5. 不等位错误检测与保护

The audio bits encoded in each AMR-WB frame are sorted according to their different perceptual sensitivity to bit errors. In cellular systems, for example, this property can be exploited to achieve better voice quality, by using unequal error protection and detection (UEP and UED) mechanisms. However, the bits of the extension frame types of the AMR-WB+ codec do not have a consistent perceptual significance property and are not sorted in this order. Thus, UEP or UED is meaningless with the extension frame types. If there is a need to use UEP or UED for AMR-WB frame types, it is recommended that RFC 3267 [7] be used.

在每个AMR-WB帧中编码的音频比特根据其对比特错误的不同感知灵敏度进行排序。例如,在蜂窝系统中,通过使用不等错误保护和检测(UEP和UED)机制,可以利用该特性来实现更好的语音质量。然而,AMR-WB+编解码器的扩展帧类型的比特没有一致的感知意义属性,并且没有按此顺序排序。因此,UEP或UED对于扩展帧类型是无意义的。如果需要对AMR-WB帧类型使用UEP或UED,建议使用RFC 3267[7]。

3.6. Robustness against Packet Loss
3.6. 抗丢包的鲁棒性

The payload format supports two mechanisms to improve robustness against packet loss: simple forward error correction (FEC) and frame interleaving.

有效负载格式支持两种机制来提高对数据包丢失的鲁棒性:简单前向纠错(FEC)和帧交织。

3.6.1. Use of Forward Error Correction (FEC)
3.6.1. 前向纠错(FEC)的使用

Generic forward error correction within RTP is defined, for example, in RFC 2733 [11]. Audio redundancy coding is defined in RFC 2198 [12]. Either scheme can be used to add redundant information to the RTP packet stream and make it more resilient to packet losses, at the expense of a higher bit rate. Please see either RFC for a discussion of the implications of the higher bit rate to network congestion.

例如,在RFC 2733[11]中定义了RTP内的通用前向纠错。RFC 2198[12]中定义了音频冗余编码。这两种方案均可用于向RTP数据包流中添加冗余信息,并使其在以更高比特率为代价的情况下对数据包丢失更具弹性。有关更高比特率对网络拥塞的影响的讨论,请参见RFC。

In addition to these media-unaware mechanisms, this memo specifies an AMR-WB+ specific form of audio redundancy coding, which may be beneficial in terms of packetization overhead.

除了这些媒体不知道的机制外,本备忘录还规定了AMR-WB+特定形式的音频冗余编码,这可能有利于分组开销。

Conceptually, previously transmitted transport frames are aggregated together with new ones. A sliding window is used to group the frames to be sent in each payload. Figure 1 below shows an example.

从概念上讲,先前传输的传输帧与新传输帧聚合在一起。滑动窗口用于对每个有效负载中要发送的帧进行分组。下面的图1显示了一个示例。

   --+--------+--------+--------+--------+--------+--------+--------+--
     | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
   --+--------+--------+--------+--------+--------+--------+--------+--
        
   --+--------+--------+--------+--------+--------+--------+--------+--
     | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
   --+--------+--------+--------+--------+--------+--------+--------+--
        
     <---- p(n-1) ---->
              <----- p(n) ----->
                       <---- p(n+1) ---->
                                <---- p(n+2) ---->
                                         <---- p(n+3) ---->
                                                  <---- p(n+4) ---->
        
     <---- p(n-1) ---->
              <----- p(n) ----->
                       <---- p(n+1) ---->
                                <---- p(n+2) ---->
                                         <---- p(n+3) ---->
                                                  <---- p(n+4) ---->
        

Figure 1: An example of redundant transmission

图1:冗余传输示例

Here, each frame is retransmitted once in the following RTP payload packet. F(n-2)...f(n+4) denote a sequence of audio frames, and p(n-1)...p(n+4) a sequence of payload packets.

这里,每个帧在下面的RTP有效负载分组中重传一次。F(n-2)…F(n+4)表示音频帧序列,p(n-1)…p(n+4)表示有效负载分组序列。

The mechanism described does not require signaling at the session setup. In other words, the audio sender can choose to use this scheme without consulting the receiver. For a certain timestamp, the receiver may receive multiple copies of a frame containing encoded audio data or frames indicated as NO_DATA. The cost of this scheme is bandwidth and the receiver delay necessary to allow the redundant copy to arrive.

所述机制在会话设置时不需要信令。换句话说,音频发送方可以选择使用此方案,而无需咨询接收方。对于某一时间戳,接收器可接收包含编码音频数据或指示为无_数据的帧的多个副本。该方案的成本是允许冗余副本到达所需的带宽和接收器延迟。

This redundancy scheme provides a functionality similar to the one described in RFC 2198, but it works only if both original frames and redundant representations are AMR-WB+ frames. When the use of other media coding schemes is desirable, one has to resort to RFC 2198.

该冗余方案提供了与RFC 2198中描述的功能类似的功能,但仅当原始帧和冗余表示均为AMR-WB+帧时,该方案才有效。当需要使用其他媒体编码方案时,必须求助于RFC 2198。

The sender is responsible for selecting an appropriate amount of redundancy based on feedback about the channel conditions, e.g., in the RTP Control Protocol (RTCP) [3] receiver reports. The sender is also responsible for avoiding congestion, which may be exacerbated by redundancy (see Section 5 for more details).

发送方负责根据有关信道条件的反馈选择适当数量的冗余,例如,在RTP控制协议(RTCP)[3]接收方报告中。发送方还负责避免因冗余而加剧的拥塞(有关更多详细信息,请参阅第5节)。

3.6.2. Use of Frame Interleaving
3.6.2. 帧交织的使用

To decrease protocol overhead, the payload design allows several audio transport frames to be encapsulated into a single RTP packet. One of the drawbacks of such an approach is that in case of packet loss several consecutive frames are lost. Consecutive frame loss normally renders error concealment less efficient and usually causes clearly audible and annoying distortions in the reconstructed audio. Interleaving of transport frames can improve the audio quality in such cases by distributing the consecutive losses into a number of isolated frame losses, which are easier to conceal. However, interleaving and bundling several frames per payload also increases end-to-end delay and sets higher buffering requirements. Therefore, interleaving is not appropriate for all use cases or devices. Streaming applications should most likely be able to exploit interleaving to improve audio quality in lossy transmission conditions.

为了减少协议开销,有效负载设计允许将多个音频传输帧封装到单个RTP数据包中。这种方法的缺点之一是,在分组丢失的情况下,几个连续帧丢失。连续的帧丢失通常会使错误隐藏效率降低,并且通常会在重建的音频中导致清晰可听和恼人的失真。在这种情况下,传输帧的交错可以通过将连续损耗分配到多个更容易隐藏的孤立帧损耗中来改善音频质量。然而,每个有效负载交错和捆绑几个帧也会增加端到端延迟,并设置更高的缓冲要求。因此,交织并不适用于所有用例或设备。流媒体应用程序最有可能利用交织来改善有损传输条件下的音频质量。

Note that this payload design supports the use of frame interleaving as an option. The usage of this feature needs to be negotiated in the session setup.

请注意,此有效负载设计支持使用帧交错作为选项。此功能的使用需要在会话设置中协商。

The interleaving supported by this format is rather flexible. For example, a continuous pattern can be defined, as depicted in Figure 2.

这种格式支持的交织是相当灵活的。例如,可以定义连续模式,如图2所示。

   --+--------+--------+--------+--------+--------+--------+--------+--
     | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
   --+--------+--------+--------+--------+--------+--------+--------+--
        
   --+--------+--------+--------+--------+--------+--------+--------+--
     | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
   --+--------+--------+--------+--------+--------+--------+--------+--
        
              [ P(n)   ]
     [ P(n+1) ]                 [ P(n+1) ]
                       [ P(n+2) ]                 [ P(n+2) ]
                                         [ P(n+3) ]                 [P(
                                                           [ P(n+4) ]
        
              [ P(n)   ]
     [ P(n+1) ]                 [ P(n+1) ]
                       [ P(n+2) ]                 [ P(n+2) ]
                                         [ P(n+3) ]                 [P(
                                                           [ P(n+4) ]
        

Figure 2: An example of interleaving pattern that has constant delay

图2:具有恒定延迟的交织模式示例

In Figure 2 the consecutive frames, denoted f(n-2) to f(n+4), are aggregated into packets P(n) to P(n+4), each packet carrying two frames. This approach provides an interleaving pattern that allows for constant delay in both the interleaving and deinterleaving processes. The deinterleaving buffer needs to have room for at least three frames, including the one that is ready to be consumed. The storage space for three frames is needed, for example, when f(n) is the next frame to be decoded: since frame f(n) was received in packet P(n+2), which also carried frame f(n+3), both these frames are stored in the buffer. Furthermore, frame f(n+1) received in the previous packet, P(n+1), is also in the deinterleaving buffer. Note also that in this example the buffer occupancy varies: when frame f(n+1) is the next one to be decoded, there are only two frames, f(n+1) and f(n+3), in the buffer.

在图2中,表示为f(n-2)到f(n+4)的连续帧被聚合成分组P(n)到P(n+4),每个分组承载两个帧。该方法提供了一种交织模式,允许交织和解交织过程中的恒定延迟。解交错缓冲区需要至少有三个帧的空间,包括准备使用的帧。例如,当f(n)是要解码的下一帧时,需要三个帧的存储空间:因为帧f(n)是在分组P(n+2)中接收的,分组P(n+2)也携带帧f(n+3),所以这两个帧都存储在缓冲器中。此外,在前一分组P(n+1)中接收的帧f(n+1)也在解交织缓冲器中。还请注意,在该示例中,缓冲区占用情况有所不同:当帧f(n+1)是下一个要解码的帧时,缓冲区中只有两个帧f(n+1)和f(n+3)。

3.7. AMR-WB+ Audio over IP Scenarios
3.7. AMR-WB+IP音频方案

Since the primary target application for the AMR-WB+ codec is streaming over packet networks, the most relevant usage scenario for this payload format is IP end-to-end between a server and a terminal, as shown in Figure 3.

由于AMR-WB+编解码器的主要目标应用程序是通过分组网络进行流传输,因此此有效负载格式最相关的使用场景是服务器和终端之间的IP端到端,如图3所示。

              +----------+                          +----------+
              |          |    IP/UDP/RTP/AMR-WB+    |          |
              |  SERVER  |<------------------------>| TERMINAL |
              |          |                          |          |
              +----------+                          +----------+
        
              +----------+                          +----------+
              |          |    IP/UDP/RTP/AMR-WB+    |          |
              |  SERVER  |<------------------------>| TERMINAL |
              |          |                          |          |
              +----------+                          +----------+
        

Figure 3: Server to terminal IP scenario

图3:服务器到终端IP场景

3.8. Out-of-Band Signaling
3.8. 带外信令

Some of the options of this payload format remain constant throughout a session. Therefore, they can be controlled/negotiated at the session setup. Throughout this specification, these options and variables are denoted as "parameters to be established through out-

此有效负载格式的某些选项在整个会话中保持不变。因此,可以在会话设置中对它们进行控制/协商。在本规范中,这些选项和变量表示为“通过输出建立的参数”-

of-band means". In Section 7, all the parameters are formally specified in the form of media type registration for the AMR-WB+ encoding. The method used to signal these parameters at session setup or to arrange prior agreement of the participants is beyond the scope of this document; however, Section 7.2 provides a mapping of the parameters into the Session Description Protocol (SDP) [6] for those applications that use SDP.

频带平均数“。在第7节中,所有参数均以AMR-WB+编码的媒体类型注册形式正式指定。用于在会话设置时发出这些参数信号或安排参与者事先同意的方法超出了本文件的范围;但是,第7.2节为使用SDP的应用程序提供了参数到会话描述协议(SDP)[6]的映射。

4. RTP Payload Format for AMR-WB+
4. AMR-WB的RTP有效负载格式+

The main emphasis in the payload design for AMR-WB+ has been to minimize the overhead in typical use cases, while providing full flexibility with a slightly higher overhead. In order to keep the specification reasonably simple, we refrained from defining frame-specific parameters for each frame type. Instead, a few common parameters were specified that cover all types of frames.

AMR-WB+有效负载设计的主要重点是在典型用例中最小化开销,同时以略高的开销提供充分的灵活性。为了使规范保持合理的简单,我们避免为每个帧类型定义特定于帧的参数。相反,指定了一些通用参数,这些参数涵盖所有类型的框架。

The payload format has two modes: basic mode and interleaved mode. The main structural difference between the two modes is the extension of the table of content entries with frame displacement fields when operating in the interleaved mode. The basic mode supports aggregation of multiple consecutive frames in a payload. The interleaved mode supports aggregation of multiple frames that are non-consecutive in time. In both modes it is possible to have frames encoded with different frame types in the same payload. The ISF must remain constant throughout the payload of a single packet.

有效负载格式有两种模式:基本模式和交织模式。这两种模式之间的主要结构差异是,在交错模式下操作时,使用帧位移字段扩展了内容条目表。基本模式支持有效负载中多个连续帧的聚合。交织模式支持时间上不连续的多个帧的聚合。在这两种模式中,可以在同一有效负载中使用不同的帧类型对帧进行编码。ISF必须在单个数据包的整个有效负载中保持不变。

The payload format is designed around the property of AMR-WB+ frames that the frames are consecutive in time and share the same frame duration (in the absence of an ISF change). This enables the receiver to derive the timestamp for an individual frame within a payload. In basic mode, the deriving process is based on the order of frames. In interleaved mode, it is based on the compact displacement fields. The frame timestamps are used to regenerate the correct order of frames after reception, identify duplicates, and detect lost frames that require concealment.

有效负载格式是围绕AMR-WB+帧的特性设计的,即帧在时间上是连续的,并且共享相同的帧持续时间(在没有ISF更改的情况下)。这使得接收器能够导出有效载荷内单个帧的时间戳。在基本模式下,导出过程基于帧的顺序。在交错模式下,它基于紧凑的位移场。帧时间戳用于在接收后重新生成正确的帧顺序,识别重复帧,并检测需要隐藏的丢失帧。

The interleaving scheme of this payload format is significantly more flexible than the one specified in RFC 3267. The AMR and AMR-WB payload format is only capable of using periodic patterns with frames taken from an interleaving group at fixed intervals. The interleaving scheme of this specification, in contrast, allows for any interleaving pattern, as long as the distance in decoding order between any two adjacent frames is not more than 256 frames. Note that even at the highest ISF this allows an interleaving depth of up to 3.41 seconds.

这种有效载荷格式的交织方案比RFC3267中指定的方案灵活得多。AMR和AMR-WB有效负载格式只能使用周期模式,以固定的间隔从交织组获取帧。相反,本规范的交织方案允许任何交织模式,只要任意两个相邻帧之间的解码顺序距离不超过256帧。请注意,即使在最高ISF下,这也允许交错深度高达3.41秒。

To allow for error resiliency through redundant transmission, the periods covered by multiple packets MAY overlap in time. A receiver MUST be prepared to receive any audio frame multiple times. All redundantly sent frames MUST use the same frame type and ISF, and MUST have the same RTP timestamp, or MUST be a NO_DATA frame (FT=15).

为了允许通过冗余传输进行错误恢复,多个数据包覆盖的时间段可能在时间上重叠。接收器必须准备好多次接收任何音频帧。所有冗余发送的帧必须使用相同的帧类型和ISF,并且必须具有相同的RTP时间戳,或者必须是无数据帧(FT=15)。

The payload consists of octet-aligned elements (header, ToC, and audio frames). Only the audio frames for AMR-WB frame types (0-9) require padding for octet alignment. If additional padding is desired, then the P bit in the RTP header MAY be set, and padding MAY be appended as specified in [3].

有效负载由八位字节对齐的元素(报头、ToC和音频帧)组成。只有AMR-WB帧类型(0-9)的音频帧需要八位字节对齐填充。如果需要额外的填充,则可以设置RTP报头中的P位,并且可以按照[3]中的规定追加填充。

4.1. RTP Header Usage
4.1. RTP头使用

The format of the RTP header is specified in [3]. This payload format uses the fields of the header in a manner consistent with that specification.

RTP标头的格式在[3]中指定。此有效负载格式以与该规范一致的方式使用报头的字段。

The RTP timestamp corresponds to the sampling instant of the first sample encoded for the first frame in the packet. The timestamp clock frequency SHALL be 72000 Hz. This frequency allows the frame duration to be integer RTP timestamp ticks for the ISFs specified in Table 1. It also provides reasonable conversion factors to the input/output audio sampling frequencies supported by the codec. See Section 4.3.2.3 for guidance on how to derive the RTP timestamp for any audio frame beyond the first one.

RTP时间戳对应于为分组中的第一帧编码的第一样本的采样瞬间。时间戳时钟频率应为72000 Hz。对于表1中指定的ISF,该频率允许帧持续时间为整数RTP时间戳刻度。它还为编解码器支持的输入/输出音频采样频率提供了合理的转换系数。请参见第4.3.2.3节,了解如何为第一个音频帧以外的任何音频帧推导RTP时间戳的指南。

The RTP header marker bit (M) SHALL be set to 1 whenever the first frame carried in the packet is the first frame in a talkspurt (see the definition of talkspurt in Section 4.1 of [9]). For all other packets, the marker bit SHALL be set to zero (M=0).

每当包中携带的第一帧是TalkSport中的第一帧时,RTP报头标记位(M)应设置为1(参见[9]第4.1节中TalkSport的定义)。对于所有其他数据包,标记位应设置为零(M=0)。

The assignment of an RTP payload type for the format defined in this memo is outside the scope of this document. The RTP profile in use either assigns a static payload type or mandates binding the payload type dynamically.

为本备忘录中定义的格式分配RTP有效负载类型超出了本文档的范围。正在使用的RTP配置文件要么分配静态有效负载类型,要么强制动态绑定有效负载类型。

The media type parameter "channels" is used to indicate the maximum number of channels allowed for a given payload type. A payload type where channels=1 (mono) SHALL only carry mono content. A payload type for which channels=2 has been declared MAY carry both mono and stereo content. Note that this definition is different from the one in RFC 3551 [9]. As mentioned before, the AMR-WB+ codec handles the support of stereo content and the (eventual) downmixing of stereo to mono internally. This makes it unnecessary to negotiate for the number of channels for reasons other than bit-rate efficiency.

媒体类型参数“信道”用于指示给定有效负载类型允许的最大信道数。信道=1(单声道)的有效负载类型只能承载单声道内容。已声明通道=2的有效负载类型可同时携带单声道和立体声内容。请注意,此定义与RFC 3551[9]中的定义不同。如前所述,AMR-WB+编解码器在内部处理立体声内容的支持和(最终)立体声到单声道的下混音。这使得不必出于比特率效率以外的原因协商信道数量。

4.2. Payload Structure
4.2. 有效载荷结构

The payload consists of a payload header, a table of contents, and the audio data representing one or more audio frames. The following diagram shows the general payload format layout:

有效载荷包括有效载荷头部、目录和表示一个或多个音频帧的音频数据。下图显示了一般有效负载格式布局:

   +----------------+-------------------+----------------
   | payload header | table of contents | audio data ...
   +----------------+-------------------+----------------
        
   +----------------+-------------------+----------------
   | payload header | table of contents | audio data ...
   +----------------+-------------------+----------------
        

Payloads containing more than one audio frame are called compound payloads.

包含多个音频帧的有效载荷称为复合有效载荷。

The following sections describe the variations taken by the payload format depending on the mode in use: basic mode or interleaved mode.

以下各节描述有效负载格式根据使用的模式所采取的变化:基本模式或交错模式。

4.3. Payload Definitions
4.3. 有效载荷定义
4.3.1. Payload Header
4.3.1. 有效载荷头

The payload header carries data that is common for all frames in the payload. The structure of the payload header is described below.

有效负载报头承载有效负载中所有帧的公共数据。有效载荷收割台的结构如下所述。

    0 1 2 3 4 5 6 7
   +-+-+-+-+-+-+-+-+
   |   ISF   |TFI|L|
   +-+-+-+-+-+-+-+-+
        
    0 1 2 3 4 5 6 7
   +-+-+-+-+-+-+-+-+
   |   ISF   |TFI|L|
   +-+-+-+-+-+-+-+-+
        

ISF (5 bits): Indicates the Internal Sampling Frequency employed for all frames in this payload. The index value corresponds to internal sampling frequency as specified in Table 24 in [1]. This field SHALL be set to 0 for payloads containing frames with Frame Type values 0-13.

ISF(5位):表示此有效负载中所有帧采用的内部采样频率。索引值对应于[1]表24中规定的内部采样频率。对于包含框架类型值为0-13的框架的有效载荷,该字段应设置为0。

TFI (2 bits): Transport Frame Index, from 0 (first) to 3 (last), indicating the position of the first transport frame of this payload in the AMR-WB+ super-frame structure. For payloads with frames of only Frame Type values 0-9, this field SHALL be set to 0 by the sender. The TFI value for a frame of type 0-9 SHALL be ignored by the receiver. Note that the frame type is coded in the table of contents (as discussed later); hence, the mentioned dependencies of the frame type can be applied easily by interpreting only values carried in the payload header. It is not necessary to interpret the audio bit stream itself.

TFI(2位):传输帧索引,从0(第一个)到3(最后一个),指示此有效负载的第一个传输帧在AMR-WB+超级帧结构中的位置。对于帧类型值仅为0-9的有效载荷,发送方应将该字段设置为0。接收器应忽略0-9型帧的TFI值。请注意,框架类型在目录中进行了编码(稍后讨论);因此,通过仅解释有效载荷报头中携带的值,可以容易地应用所述帧类型的依赖关系。不需要解释音频比特流本身。

L (1 bit): Long displacement field flag for payloads in interleaved mode. If set to 0, four-bit displacement fields are used to indicate interleaving offset; if set to 1, displacement fields of eight bits are used (see Section 4.3.2.2). For payloads in the basic mode, this bit SHALL be set to 0 and SHALL be ignored by the receiver.

L(1位):交错模式下有效载荷的长位移字段标志。如果设置为0,则四位位移字段用于指示交织偏移量;如果设置为1,则使用八位位移字段(见第4.3.2.2节)。对于基本模式下的有效载荷,该位应设置为0,且接收器应忽略该位。

Note that frames employing different ISF values require encapsulation in separate packets. Thus, special considerations apply when generating interleaved packets and an ISF change is executed. In particular, frames that, according to the previously used interleaving pattern, would be aggregated into a single packet have to be separated into different packets, so that the aforementioned condition (all frames in a packet share the ISF) remains true. A naive implementation that splits the frames with different ISF into different packets can result in up to twice the number of RTP packets, when compared to an optimal interleaved solution. Alteration of the interleaving before and after the ISF change may reduce the need for extra RTP packets.

请注意,使用不同ISF值的帧需要封装在单独的数据包中。因此,在生成交织分组并执行ISF更改时,需要特别考虑。特别地,根据先前使用的交织模式,将被聚合成单个分组的帧必须被分离成不同的分组,以便前述条件(分组中的所有帧共享ISF)保持为真。与最佳交织解决方案相比,将具有不同ISF的帧拆分为不同数据包的简单实现可能会导致RTP数据包数量增加两倍。在ISF改变前后改变交织可以减少对额外RTP分组的需要。

4.3.2. The Payload Table of Contents
4.3.2. 有效载荷目录

The table of contents (ToC) consists of a list of entries, each entry corresponds to a group of audio frames carried in the payload, as depicted below.

目录(ToC)由条目列表组成,每个条目对应于负载中携带的一组音频帧,如下所示。

   +----------------+----------------+- ... -+----------------+
   |  ToC entry #1  |  ToC entry #2  |          ToC entry #N  |
   +----------------+----------------+- ... -+----------------+
        
   +----------------+----------------+- ... -+----------------+
   |  ToC entry #1  |  ToC entry #2  |          ToC entry #N  |
   +----------------+----------------+- ... -+----------------+
        

When multiple groups of frames are present in a payload, the ToC entries SHALL be placed in the packet in order of increasing RTP timestamp value (modulo 2^32) of the first transport frame the TOC entry represents.

当有效载荷中存在多组帧时,ToC条目应按照增加ToC条目所代表的第一个传输帧的RTP时间戳值(模2^32)的顺序放置在分组中。

4.3.2.1. ToC Entry in the Basic Mode
4.3.2.1. 基本模式下的ToC条目

A ToC entry of a payload in the basic mode has the following format:

基本模式下有效载荷的ToC条目具有以下格式:

    0                   1
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F| Frame Type  |    #frames    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
    0                   1
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F| Frame Type  |    #frames    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

F (1 bit): If set to 1, indicates that this ToC entry is followed by another ToC entry; if set to 0, indicates that this ToC entry is the last one in the ToC.

F(1位):如果设置为1,则表示此ToC条目后面跟着另一个ToC条目;如果设置为0,则表示此ToC条目是ToC中的最后一个条目。

Frame Type (FT) (7 bits): Indicates the audio codec frame type used for the group of frames referenced by this ToC entry. FT designates the combination of AMR-WB+ core and stereo rate, one of the special AMR-WB+ frame types, the AMR-WB rate, or comfort noise, as specified by Table 25 in [1].

帧类型(FT)(7位):表示用于此ToC条目引用的帧组的音频编解码器帧类型。FT指定AMR-WB+核心和立体声速率的组合,AMR-WB+帧类型中的一种,AMR-WB速率或舒适噪声,如[1]中表25所述。

#frames (8 bits): Indicates the number of frames in the group referenced by this ToC entry. ToC entries with this field equal to 0 (which would indicate zero frames) SHALL NOT be used, and received packets with such a TOC entry SHALL be discarded.

#frames(8位):表示此ToC条目引用的组中的帧数。不得使用该字段等于0(表示零帧)的ToC条目,且应丢弃具有该ToC条目的接收数据包。

4.3.2.2. ToC Entry in the Interleaved Mode
4.3.2.2. 交织模式下的ToC条目

Two different ToC entry formats are defined in interleaved mode. They differ in the length of the displacement field, 4 bits or 8 bits. The L-bit in the payload header differentiates between the two modes.

在交织模式中定义了两种不同的ToC条目格式。它们在位移场的长度上不同,4位或8位。有效负载报头中的L位区分这两种模式。

If L=0, a ToC entry has the following format:

如果L=0,则ToC条目的格式如下:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F| Frame Type  |    #frames    |  DIS1 |  ...  |  DISi |  ...  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  ...  |  ...  |  DISn |  Padd |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F| Frame Type  |    #frames    |  DIS1 |  ...  |  DISi |  ...  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  ...  |  ...  |  DISn |  Padd |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

F (1 bit): See definition in 4.3.2.1.

F(1位):见4.3.2.1中的定义。

Frame Type (FT) (7 bits): See definition in 4.3.2.1.

帧类型(FT)(7位):见4.3.2.1中的定义。

#frames (8 bits): See definition in 4.3.2.1.

#帧(8位):见4.3.2.1中的定义。

DIS1...DISn (4 bits): A list of n (n=#frames) displacement fields indicating the displacement of the i:th (i=1..n) audio frame relative to the preceding audio frame in the payload, in units of frames. The four-bit unsigned integer displacement values may be between 0 and 15, indicating the number of audio frames in decoding order between the (i-1):th and the i:th frame in the payload. Note that for the first ToC entry of the payload, the value of DIS1 is meaningless. It SHALL be set to zero by a sender and SHALL be ignored by a receiver. This frame's location in the decoding order is uniquely defined by the RTP timestamp and TFI in the payload header. Note also that for subsequent ToC entries, DIS1 indicates the number of frames between the last frame of the previous group and the first frame of this group.

DIS1…DISn(4位):n(n=#帧)位移字段的列表,指示i:th(i=1..n)音频帧相对于有效负载中的前一音频帧的位移,以帧为单位。四位无符号整数位移值可以在0和15之间,指示有效载荷中(i-1):th和i:th帧之间以解码顺序的音频帧的数量。请注意,对于有效负载的第一个ToC条目,DIS1的值没有意义。发送方应将其设置为零,接收方应忽略。该帧在解码顺序中的位置由有效负载报头中的RTP时间戳和TFI唯一定义。还请注意,对于后续ToC条目,DIS1表示上一组的最后一帧和该组的第一帧之间的帧数。

Padd (4 bits): To ensure octet alignment, four padding bits SHALL be included at the end of the ToC entry in case there is odd number of frames in the group referenced by this entry. These bits SHALL be set to zero and SHALL be ignored by the receiver. If a group containing an even number of frames is referenced by this ToC entry, these padding bits SHALL NOT be included in the payload.

Padd(4位):为确保八位字节对齐,如果ToC条目引用的组中有奇数帧,则应在ToC条目末尾包含四个填充位。这些位应设置为零,并应被接收器忽略。如果包含偶数帧的组被该ToC条目引用,则这些填充位不应包括在有效载荷中。

If L=1, a ToC entry has the following format:

如果L=1,ToC条目的格式如下:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F| Frame Type  |    #frames    |      DIS1     |      ...      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      ...      |     DISn      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F| Frame Type  |    #frames    |      DIS1     |      ...      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      ...      |     DISn      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

F (1 bit): See definition in 4.3.2.1.

F(1位):见4.3.2.1中的定义。

Frame Type (FT) (7 bits): See definition in 4.3.2.1.

帧类型(FT)(7位):见4.3.2.1中的定义。

#frames (8 bits): See definition in 4.3.2.1.

#帧(8位):见4.3.2.1中的定义。

DIS1...DISn (8 bits): A list of n (n=#frames) displacement fields indicating the displacement of the i:th (i=1..n) audio frame relative to the preceding audio frame in the payload, in units of frames. The eight-bit unsigned integer displacement values may be between 0 and 255, indicating the number of audio frames in decoding order between the (i-1):th and the i:th frame in the payload. Note that for the first ToC entry of the payload, the value of DIS1 is meaningless. It SHALL be set to zero by a sender and SHALL be ignored by a receiver. This frame's location in the decoding order is uniquely defined by the RTP timestamp and TFI in the payload header. Note also that for subsequent ToC entries, DIS1 indicates the displacement between the last frame of the previous group and the first frame of this group.

DIS1…DISn(8位):n(n=#帧)位移字段的列表,指示i:th(i=1..n)音频帧相对于有效负载中的前一音频帧的位移,以帧为单位。八位无符号整数位移值可以在0和255之间,指示有效载荷中(i-1):th和i:th帧之间以解码顺序的音频帧的数量。请注意,对于有效负载的第一个ToC条目,DIS1的值没有意义。发送方应将其设置为零,接收方应忽略。该帧在解码顺序中的位置由有效负载报头中的RTP时间戳和TFI唯一定义。还要注意,对于后续ToC条目,DIS1表示上一组的最后一帧和该组的第一帧之间的位移。

4.3.2.3. RTP Timestamp Derivation
4.3.2.3. RTP时间戳推导

The RTP Timestamp value for a frame SHALL be the timestamp value of the first audio sample encoded in the frame. The timestamp value for a frame is derived differently depending on the payload mode, basic or interleaved. In both cases, the first frame in a compound packet has an RTP timestamp equal to the one received in the RTP header. In the basic mode, the RTP time for any subsequent frame is derived in two steps. First, the sum of the frame durations (see Table 1) of all the preceding frames in the payload is calculated. Then, this sum is added to the RTP header timestamp value. For example, let's

帧的RTP时间戳值应为帧中编码的第一个音频样本的时间戳值。帧的时间戳值根据有效负载模式(基本模式或交织模式)的不同而导出。在这两种情况下,复合分组中的第一帧具有与在RTP报头中接收的帧相同的RTP时间戳。在基本模式下,任何后续帧的RTP时间分两步导出。首先,计算有效载荷中所有先前帧的帧持续时间之和(参见表1)。然后,将该总和添加到RTP报头时间戳值中。例如,让我们

assume that the RTP Header timestamp value is 12345, the payload carries four frames, and the frame duration is 16 ms (ISF = 32 kHz) corresponding to 1152 timestamp ticks. Then the RTP timestamp of the fourth frame in the payload is 12345 + 3 * 1152 = 15801.

假设RTP报头时间戳值为12345,有效负载承载四个帧,帧持续时间为16 ms(ISF=32 kHz),对应于1152个时间戳标记。然后,有效载荷中的第四帧的RTP时间戳是12345+3×1152=15801。

In interleaved mode, the RTP timestamp for each frame in the payload is derived from the RTP header timestamp and the sum of the time offsets of all preceding frames in this payload. The frame timestamps are computed based on displacement fields and the frame duration derived from the ISF value. Note that the displacement in time between frame i-1 and frame i is (DISi + 1) * frame duration because the duration of the (i-1):th must also be taken into account. The timestamp of the first frame of the first group of frames (TS(1)) (i.e., the first frame of the payload) is the RTP header timestamp. For subsequent frames in the group, the timestamp is computed by

在交织模式下,有效载荷中每一帧的RTP时间戳是从RTP报头时间戳和该有效载荷中所有先前帧的时间偏移之和导出的。帧时间戳基于位移场和从ISF值导出的帧持续时间计算。注意,帧i-1和帧i之间的时间位移为(DISi+1)*帧持续时间,因为还必须考虑(i-1):th的持续时间。第一帧组(TS(1))的第一帧的时间戳(即,有效载荷的第一帧)是RTP报头时间戳。对于组中的后续帧,时间戳由

      TS(i) = TS(i-1) + (DISi + 1) * frame duration,    2 < i < n
        
      TS(i) = TS(i-1) + (DISi + 1) * frame duration,    2 < i < n
        

For subsequent groups of frames, the timestamp of the first frame is computed by

对于后续帧组,第一帧的时间戳由以下公式计算:

TS(1) = TSprev + (DIS1 + 1) * frame duration,

TS(1)=TSprev+(DIS1+1)*帧持续时间,

where TSprev denotes the timestamp of the last frame in the previous group. The timestamps of the subsequent frames in the group are computed in the same way as for the first group.

其中TSprev表示前一组中最后一帧的时间戳。以与第一组相同的方式计算组中后续帧的时间戳。

The following example derives the RTP timestamps for the frames in an interleaved mode payload having the following header and ToC information:

以下示例推导具有以下报头和ToC信息的交织模式有效载荷中的帧的RTP时间戳:

   RTP header timestamp: 12345
   ISF = 32 kHz
   Frame 1 displacement field: DIS1 = 0
   Frame 2 displacement field: DIS2 = 6
   Frame 3 displacement field: DIS3 = 4
   Frame 4 displacement field: DIS4 = 7
        
   RTP header timestamp: 12345
   ISF = 32 kHz
   Frame 1 displacement field: DIS1 = 0
   Frame 2 displacement field: DIS2 = 6
   Frame 3 displacement field: DIS3 = 4
   Frame 4 displacement field: DIS4 = 7
        

Assuming an ISF of 32 kHz, which implies a frame duration of 16 ms, one frame lasts 1152 ticks. The timestamp of the first frame in the payload is the RTP timestamp, i.e., TS(1) = RTP TS. Note that the displacement field value for this frame must be ignored. For the second frame in the payload, the timestamp can be calculated as TS(2) = TS(1) + (DIS2 + 1) * 1152 = 20409. For the third frame, the timestamp is TS(3) = TS(2) + (DIS3 + 1) * 1152 = 26169. Finally, for the fourth frame of the payload, we have TS(4) = TS(3) + (DIS4 + 1) * 1152 = 35385.

假设ISF为32 kHz,这意味着帧持续时间为16 ms,则一帧持续1152个滴答声。有效载荷中第一帧的时间戳是RTP时间戳,即TS(1)=RTP TS。请注意,必须忽略此帧的位移字段值。对于有效载荷中的第二帧,时间戳可计算为TS(2)=TS(1)+(DIS2+1)*1152=20409。对于第三帧,时间戳是TS(3)=TS(2)+(DIS3+1)*1152=26169。最后,对于有效载荷的第四帧,我们有TS(4)=TS(3)+(DIS4+1)*1152=35385。

4.3.2.4. Frame Type Considerations
4.3.2.4. 框架类型注意事项

The value of Frame Type (FT) is defined in Table 25 in [1]. FT=14 (AUDIO_LOST) is used to denote frames that are lost. A NO_DATA (FT=15) frame could result from two situations: First, that no data has been produced by the audio encoder; and second, that no data is transmitted in the current payload. An example for the latter would be that the frame in question has been or will be sent in an earlier or later packet. The duration for these non-included frames is dependent on the internal sampling frequency indicated by the ISF field.

[1]中的表25定义了框架类型(FT)的值。FT=14(音频_丢失)用于表示丢失的帧。无数据(FT=15)帧可能由两种情况导致:第一,音频编码器未生成任何数据;第二,在当前有效载荷中不传输数据。后者的一个例子是,所讨论的帧已经或将在更早或更晚的分组中发送。这些不包括的帧的持续时间取决于ISF字段指示的内部采样频率。

For frame types with index 0-13, the ISF field SHALL be set 0. The frame duration for these frame types is fixed to 20 ms in time, i.e., 1440 ticks in 72 kHz. For payloads containing only frames of type 0-9, the TFI field SHALL be set to 0 and SHALL be ignored by the receiver. In a payload combining frames of type 0-9 and 10-13, the TFI values need to be set to match the transport frames of type 10-13. Thus, frames of type 0-9 will also have a derived TFI, which is ignored.

对于索引为0-13的框架类型,ISF字段应设置为0。这些帧类型的帧持续时间固定为20毫秒,即72 kHz中的1440个滴答声。对于仅包含0-9型帧的有效载荷,TFI字段应设置为0,且接收器应忽略。在组合类型为0-9和10-13的帧的有效载荷中,需要设置TFI值以匹配类型为10-13的传输帧。因此,类型为0-9的帧也将具有派生的TFI,该TFI将被忽略。

4.3.2.5. Other TOC Considerations
4.3.2.5. 其他TOC考虑事项

If a ToC entry with an undefined FT value is received, the whole packet SHALL be discarded. This is to avoid the loss of data synchronization in the depacketization process, which can result in a severe degradation in audio quality.

如果接收到未定义FT值的ToC条目,则应丢弃整个数据包。这是为了避免在解包过程中丢失数据同步,从而导致音频质量严重下降。

Packets containing only NO_DATA frames SHOULD NOT be transmitted. Also, NO_DATA frames at the end of a frame sequence to be carried in a payload SHOULD NOT be included in the transmitted packet. The AMR-WB+ SCR/DTX is identical with AMR-WB SCR/DTX described in [5] and can only be used in combination with the AMR-WB frame types (0-8).

不应传输仅包含无_数据帧的数据包。此外,在要在有效载荷中承载的帧序列的结尾处的NO_数据帧不应包括在发送的分组中。AMR-WB+SCR/DTX与[5]中描述的AMR-WB SCR/DTX相同,只能与AMR-WB帧类型(0-8)结合使用。

When multiple groups of frames are present, their ToC entries SHALL be placed in the ToC in order of increasing RTP timestamp value (modulo 2^32) of the first transport frame the TOC entry represents, independent of the payload mode. In basic mode, the frames SHALL be consecutive in time, while in interleaved mode the frames MAY not only be non-consecutive in time but MAY even have varying inter-frame distances.

当存在多组帧时,其ToC条目应按照增加ToC条目所代表的第一个传输帧的RTP时间戳值(模2^32)的顺序放置在ToC中,与有效负载模式无关。在基本模式下,帧应在时间上连续,而在交织模式下,帧不仅在时间上不连续,甚至可能具有不同的帧间距离。

4.3.2.6. ToC Examples
4.3.2.6. ToC示例

The following example illustrates a ToC for three audio frames in basic mode. Note that in this case all audio frames are encoded using the same frame type, i.e., there is only one ToC entry.

以下示例说明了基本模式下三个音频帧的ToC。注意,在这种情况下,所有音频帧都使用相同的帧类型进行编码,即只有一个ToC条目。

    0                   1
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| Frame Type1 |  #frames = 3  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
    0                   1
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| Frame Type1 |  #frames = 3  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

The next example depicts a ToC of three entries in basic mode. Note that in this case the payload also carries three frames, but three ToC entries are needed because the frames of the payload are encoded using different frame types.

下一个示例描述了基本模式下三个条目的ToC。注意,在这种情况下,有效载荷还携带三个帧,但是需要三个ToC条目,因为有效载荷的帧使用不同的帧类型进行编码。

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |1| Frame Type1 |  #frames = 1  |1| Frame Type2 |  #frames = 1  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| Frame Type3 |  #frames = 1  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |1| Frame Type1 |  #frames = 1  |1| Frame Type2 |  #frames = 1  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| Frame Type3 |  #frames = 1  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

The following example illustrates a ToC with two entries in interleaved mode using four-bit displacement fields. The payload includes two groups of frames, the first one including a single frame, and the other one consisting of two frames.

下面的示例说明了使用四位位移字段在交织模式下具有两个条目的ToC。有效载荷包括两组帧,第一组包括单个帧,另一组包括两个帧。

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |1| Frame Type1 |  #frames = 1  |  DIS1 |  padd |0| Frame Type2 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  #frames = 2  |  DIS1 |  DIS2 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |1| Frame Type1 |  #frames = 1  |  DIS1 |  padd |0| Frame Type2 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  #frames = 2  |  DIS1 |  DIS2 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
4.3.3. Audio Data
4.3.3. 音频数据

Audio data of a payload consists of zero or more audio frames, as described in the ToC of the payload.

有效载荷的音频数据由零个或多个音频帧组成,如有效载荷的ToC中所述。

ToC entries with FT=14 or 15 represent frame types with a length of 0. Hence, no data SHALL be placed in the audio data section to represent frames of this type.

FT=14或15的ToC条目表示长度为0的帧类型。因此,不得在音频数据段中放置任何数据来表示这种类型的帧。

As already discussed, each audio frame of an extension frame type represents an AMR-WB+ transport frame corresponding to the encoding of 512 samples of audio, sampled with the internal sampling frequency specified by the ISF indicator. As an exception, frame types with index 10-13 are only capable of using a single internal sampling frequency (25600 Hz). The encoding rates (combination of core bit-rate and stereo bit-rate) are indicated in the frame type field of

如前所述,扩展帧类型的每个音频帧表示对应于512个音频样本编码的AMR-WB+传输帧,以ISF指示符指定的内部采样频率进行采样。作为例外,索引为10-13的帧类型只能使用单个内部采样频率(25600 Hz)。编码速率(核心比特率和立体声比特率的组合)在的帧类型字段中指示

the corresponding ToC entry. The octet length of the audio frame is implicitly defined by the frame type field and is given in Tables 21 and 25 of [1]. The order and numbering notation of the bits are as specified in [1]. For the AMR-WB+ extension frame types and comfort noise frames, the bits are in the order produced by the encoder. The last octet of each audio frame MUST be padded with zeroes at the end if not all bits in the octet are used. In other words, each audio frame MUST be octet-aligned.

对应的ToC条目。音频帧的八位字节长度由帧类型字段隐式定义,并在[1]的表21和25中给出。位的顺序和编号符号如[1]所述。对于AMR-WB+扩展帧类型和舒适噪声帧,位按编码器产生的顺序排列。如果没有使用八位字节中的所有位,则每个音频帧的最后一个八位字节必须在末尾用零填充。换句话说,每个音频帧必须是八位组对齐的。

4.3.4. Methods for Forming the Payload
4.3.4. 形成有效载荷的方法

The payload begins with the payload header, followed by the table of contents, which consists of a list of ToC entries.

有效负载以有效负载头开始,然后是目录,该目录由ToC条目列表组成。

The audio data follows the table of contents. All the octets comprising an audio frame SHALL be appended to the payload as a unit. The audio frames are packetized in timestamp order within each group of frames (per ToC entry). The groups of frames are packetized in the same order as their corresponding ToC entries. Note that there are no data octets in a group having a ToC entry with FT=14 or FT=15.

音频数据遵循目录。构成音频帧的所有八位字节应作为一个单元附加到有效载荷上。音频帧在每组帧内按时间戳顺序打包(每个ToC条目)。帧组按与其对应的ToC条目相同的顺序打包。请注意,在具有FT=14或FT=15的ToC条目的组中没有数据八位字节。

4.3.5. Payload Examples
4.3.5. 有效载荷示例

4.3.5.1. Example 1: Basic Mode Payload Carrying Multiple Frames Encoded Using the Same Frame Type

4.3.5.1. 示例1:承载使用相同帧类型编码的多个帧的基本模式有效负载

Figure 4 depicts a payload that carries three AMR-WB+ frames encoded using 14 kbit/s frame type (FT=26) with a frame length of 280 bits (35 bytes). The internal sampling frequency in this example is 25.6 kHz (ISF = 8). The TFI for the first frame is 2, indicating that the first transport frame in this payload is the third in a super-frame. Since this payload is in the basic mode, the subsequent frames of the payload are consecutive frames in decoding order, i.e., the fourth transport frame of the current super-frame and the first transport frame of the next super-frame. Note that because the frames are all encoded using the same frame type, only one ToC entry is required.

图4描述了承载三个AMR-WB+帧的有效载荷,这些帧使用14 kbit/s帧类型(FT=26)编码,帧长度为280位(35字节)。本例中的内部采样频率为25.6 kHz(ISF=8)。第一帧的TFI为2,表示此有效负载中的第一个传输帧是超级帧中的第三个传输帧。由于该有效载荷处于基本模式,因此有效载荷的后续帧是解码顺序上的连续帧,即,当前超级帧的第四传输帧和下一超级帧的第一传输帧。请注意,由于所有帧都使用相同的帧类型进行编码,因此只需要一个ToC条目。

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ISF = 8 | 2 |0|0|  FT = 26    |  #frames = 3  |   f1(0...7)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...           | f1(272...279) |   f2(0...7)   |               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | f2(272...279) |   f3(0...7)   | ...                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...                                           | f3(272...279) |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ISF = 8 | 2 |0|0|  FT = 26    |  #frames = 3  |   f1(0...7)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...           | f1(272...279) |   f2(0...7)   |               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | f2(272...279) |   f3(0...7)   | ...                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...                                           | f3(272...279) |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Figure 4: An example of a basic mode payload carrying three frames of the same frame type

图4:承载三个相同帧类型帧的基本模式有效载荷示例

4.3.5.2. Example 2: Basic Mode Payload Carrying Multiple Frames Encoded Using Different Frame Types

4.3.5.2. 示例2:承载使用不同帧类型编码的多个帧的基本模式有效载荷

Figure 5 depicts a payload that carries three AMR-WB+ frames; the first frame is encoded using 18.4 kbit/s frame type (FT=33) with a frame length of 368 bits (46 bytes), and the two subsequent frames are encoded using 20 kbit/s frame type (FT=35) having frame length of 400 bits (50 bytes). The internal sampling frequency in this example is 32 kHz (ISF = 10), implying the overall bit-rates of 23 kbit/s for the first frame of the payload, and 25 kbit/s for the subsequent frames. The TFI for the first frame is 3, indicating that the first transport frame in this payload is the fourth in a super-frame. Since this is a payload in the basic mode, the subsequent frames of the payload are consecutive frames in decoding order, i.e., the first and second transport frames of the current super-frame. Note that since the payload carries two different frame types, there are two ToC entries.

图5描述了承载三个AMR-WB+帧的有效载荷;第一帧使用帧长度为368位(46字节)的18.4 kbit/s帧类型(FT=33)编码,随后两帧使用帧长度为400位(50字节)的20 kbit/s帧类型(FT=35)编码。本示例中的内部采样频率为32 kHz(ISF=10),这意味着有效负载的第一帧的总比特率为23 kbit/s,后续帧的总比特率为25 kbit/s。第一帧的TFI为3,表示该有效载荷中的第一传输帧是超级帧中的第四传输帧。由于这是基本模式中的有效载荷,因此有效载荷的后续帧是解码顺序中的连续帧,即,当前超级帧的第一和第二传输帧。请注意,由于有效负载承载两种不同的帧类型,因此有两个ToC条目。

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  ISF=10 | 3 |0|1|  FT = 33    |  #frames = 1  |0|  FT = 35    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  #frames = 2  |   f1(0...7)   | ...                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...                           | f1(360...367) |   f2(0...7)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | f2(392...399) |   f3(0...7)   | ...                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...                           | f3(392...399) |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  ISF=10 | 3 |0|1|  FT = 33    |  #frames = 1  |0|  FT = 35    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  #frames = 2  |   f1(0...7)   | ...                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...                           | f1(360...367) |   f2(0...7)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | f2(392...399) |   f3(0...7)   | ...                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...                           | f3(392...399) |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Figure 5: An example of a basic mode payload carrying three frames employing two different frame types

图5:承载三个采用两种不同帧类型的帧的基本模式有效载荷示例

4.3.5.3. Example 3: Payload in Interleaved Mode
4.3.5.3. 示例3:交织模式下的有效负载

The example in Figure 6 depicts a payload in interleaved mode, carrying four frames encoded using 32 kbit/s frame type (FT=47) with frame length of 640 bits (80 bytes). The internal sampling frequency is 38.4 kHz (ISF = 13), implying a bit-rate of 48 kbit/s for all frames in the payload. The TFI for the first frame is 0; hence, it is the first transport frame of a super-frame. The displacement fields for the subsequent frames are DIS2=18, DIS3=15, and DIS4=10, which indicates that the subsequent frames have the TFIs of 3, 3, and 2, respectively. The long displacement field flag L in the payload header is set to 1, which results in the use of eight bits for the displacement fields in the ToC entry. Note that since all frames of this payload are encoded using the same frame type, there is need only for a single ToC entry. Furthermore, the displacement field for the first frame (corresponding to the first ToC entry with DIS1=0) must be ignored, since its timestamp and TFI are defined by the RTP timestamp and the TFI found in the payload header.

图6中的示例描述了交织模式下的有效负载,承载使用32 kbit/s帧类型(FT=47)编码的四个帧,帧长度为640位(80字节)。内部采样频率为38.4 kHz(ISF=13),这意味着有效负载中所有帧的比特率为48 kbit/s。第一帧的TFI为0;因此,它是超级帧的第一个传输帧。后续帧的位移场为DIS2=18、DIS3=15和DIS4=10,这表明后续帧的TFI分别为3、3和2。有效载荷标题中的长位移字段标志L设置为1,这导致ToC条目中位移字段使用8位。请注意,由于此有效负载的所有帧都使用相同的帧类型进行编码,因此只需要一个ToC条目。此外,必须忽略第一帧的位移字段(对应于DIS1=0的第一个ToC条目),因为其时间戳和TFI由在有效负载报头中找到的RTP时间戳和TFI定义。

The RTP timestamp values of the frames in this example are:

本示例中帧的RTP时间戳值为:

   Frame1: TS1 = RTP Timestamp
   Frame2: TS2 = TS1 + 19 * 960
   Frame3: TS3 = TS2 + 16 * 960
   Frame4: TS4 = TS3 + 11 * 960
        
   Frame1: TS1 = RTP Timestamp
   Frame2: TS2 = TS1 + 19 * 960
   Frame3: TS3 = TS2 + 16 * 960
   Frame4: TS4 = TS3 + 11 * 960
        
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  ISF=13 | 0 |1|0|  FT = 47    |  #frames = 4  |   DIS1 = 0    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   DIS2 = 18   |   DIS3 = 15   |   DIS4 = 10   |   f1(0...7)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...                           | f1(632...639) |   f2(0...7)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...                           | f2(632...639) |   f3(0...7)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...                           | f3(632...639) |   f4(0...7)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...                           | f4(632...639) |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  ISF=13 | 0 |1|0|  FT = 47    |  #frames = 4  |   DIS1 = 0    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   DIS2 = 18   |   DIS3 = 15   |   DIS4 = 10   |   f1(0...7)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...                           | f1(632...639) |   f2(0...7)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...                           | f2(632...639) |   f3(0...7)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...                           | f3(632...639) |   f4(0...7)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...                           | f4(632...639) |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Figure 6: An example of an interleaved mode payload carrying four frames at the same frame type

图6:以相同帧类型承载四个帧的交织模式有效负载示例

4.4. Interleaving Considerations
4.4. 交错考虑

The use of interleaving requires further considerations. As presented in the example in Section 3.6.2, a given interleaving pattern requires a certain amount of the deinterleaving buffer. This buffer space, expressed in a number of transport frame slots, is indicated by the "interleaving" media type parameter. The number of frame slots needed can be converted into actual memory requirements by considering the 80 bytes per frame used by the largest combination of AMR-WB+'s core and stereo rates.

交织的使用需要进一步考虑。如第3.6.2节中的示例所示,给定的交织模式需要一定数量的解交织缓冲区。该缓冲区空间以传输帧时隙的数量表示,由“交错”媒体类型参数表示。通过考虑AMR-WB+核心速率和立体声速率的最大组合使用的每帧80字节,所需的帧插槽数量可以转换为实际内存需求。

The information about the frame buffer size is not always sufficient to determine when it is appropriate to start consuming frames from the interleaving buffer. There are two cases in which additional information is needed: first, when switching of the ISF occurs, and second, when the interleaving pattern changes. The "int-delay" media type parameter is defined to convey this information. It allows a sender to indicate the minimal media time that needs to be present in the buffer before the decoder can start consuming frames from the buffer. Because the sender has full control over ISF changes and the interleaving pattern, it can calculate this value.

关于帧缓冲区大小的信息并不总是足以确定何时开始使用交织缓冲区中的帧是合适的。有两种情况需要额外的信息:第一,当ISF发生切换时,第二,当交织模式改变时。定义“int delay”媒体类型参数以传递此信息。它允许发送方指示在解码器开始使用缓冲区中的帧之前,缓冲区中需要存在的最小媒体时间。因为发送方可以完全控制ISF更改和交织模式,所以它可以计算该值。

In certain cases (for example, if joining a multicast session with interleaving mid-session), a receiver may initially receive only part of the packets in the interleaving pattern. This initial partial reception (in frame sequence order) of frames can yield too few frames for acceptable quality from the audio decoding. This problem also arises when using encryption for access control, and the receiver does not have the previous key.

在某些情况下(例如,如果加入具有交织中间会话的多播会话),接收机最初可能仅接收交织模式中的部分分组。帧的这种初始部分接收(按帧序列顺序)可能会产生太少的帧,无法从音频解码获得可接受的质量。当使用加密进行访问控制,并且接收方没有以前的密钥时,也会出现此问题。

Although the AMR-WB+ is robust and thus tolerant to a high random frame erasure rate, it would have difficulties handling consecutive frame losses at startup. Thus, some special implementation considerations are described. In order to handle this type of startup efficiently, it must be noted that decoding is only possible to start at the beginning of a super-frame, and that holds true even if the first transport frame is indicated as lost. Secondly, decoding is only RECOMMENDED to start if at least 2 transport frames are available out of the 4 belonging to that super-frame.

尽管AMR-WB+非常健壮,因此能够承受高随机帧擦除率,但它在启动时处理连续帧丢失时会遇到困难。因此,描述了一些特殊的实现注意事项。为了有效地处理这种类型的启动,必须注意,解码仅可能在超级帧的开始处开始,并且即使第一传输帧被指示为丢失,这仍然成立。其次,仅当属于该超级帧的4个传输帧中至少有2个可用时,才建议开始解码。

After receiving a number of packets, in the worst case as many packets as the interleaving pattern covers, the previously described effects disappear and normal decoding is resumed.

在接收到多个分组之后,在最坏的情况下,与交织模式覆盖的分组一样多的分组,先前描述的效果消失并且恢复正常解码。

Similar issues arise when a receiver leaves a session or has lost access to the stream. If the receiver leaves the session, this would be a minor issue since playout is normally stopped. It is also a minor issue for the case of lost access, since the AMR-WB+ error concealment will fade out the audio if massive consecutive losses are encountered.

当接收器离开会话或失去对流的访问时,也会出现类似的问题。如果接收器离开会话,这将是一个小问题,因为播放通常会停止。这对于丢失访问也是一个小问题,因为如果遇到大量连续丢失,AMR-WB+错误隐藏将使音频淡出。

The sender can avoid this type of problem in many sessions by starting and ending interleaving patterns correctly when risks of losses occur. One such example is a key-change done for access control to encrypted streams. If only some keys are provided to clients and there is a risk of their receiving content for which they do not have the key, it is recommended that interleaving patterns not overlap key changes.

发送方可以通过在发生丢失风险时正确启动和结束交织模式,在许多会话中避免此类问题。一个这样的例子是为对加密流进行访问控制而进行的密钥更改。如果只向客户端提供了一些密钥,并且存在接收到没有密钥的内容的风险,建议交错模式不要与密钥更改重叠。

4.5. Implementation Considerations
4.5. 实施考虑

An application implementing this payload format MUST understand all the payload parameters. Any mapping of the parameters to a signaling protocol MUST support all parameters. So an implementation of this payload format in an application using SDP is required to understand all the payload parameters in their SDP-mapped form. This requirement ensures that an implementation always can decide whether it is capable of communicating.

实现此有效负载格式的应用程序必须了解所有有效负载参数。参数到信令协议的任何映射都必须支持所有参数。因此,需要在使用SDP的应用程序中实现此有效负载格式,才能理解SDP映射形式中的所有有效负载参数。此要求确保实现始终可以决定是否能够通信。

Both basic and interleaved mode SHALL be implemented. The implementation burden of both is rather small, and requiring both ensures interoperability. As the AMR-WB+ codec contains the full functionality of the AMR-WB codec, it is RECOMMENDED to also implement the payload format in RFC 3267 [7] for the AMR-WB frame types when implementing this specification. Doing so makes interoperability with devices that only support AMR-WB more likely.

应实现基本模式和交错模式。两者的实现负担都相当小,并且要求两者都能确保互操作性。由于AMR-WB+编解码器包含AMR-WB编解码器的全部功能,因此在实施本规范时,建议还针对AMR-WB帧类型实施RFC 3267[7]中的有效负载格式。这样做使得与仅支持AMR-WB的设备的互操作性更有可能。

The switching of ISF, when combined with packet loss, could result in concealment using the wrong audio frame length. This can occur if packet losses result in lost frames directly after the point of ISF change. The packet loss would prevent the receiver from noticing the changed ISF and thereby conceal the lost transport frame with the previous ISF, instead of the new one. Although always later detectable, such an error results in frame boundary misalignment, which can cause audio distortions and problems with synchronization, as too many or too few audio samples were created. This problem can be mitigated in most cases by performing ISF recovery prior to concealment as outlined in Section 4.5.1.

当ISF的切换与数据包丢失相结合时,可能会导致使用错误的音频帧长度进行隐藏。如果数据包丢失直接导致ISF更改点后的帧丢失,则可能发生这种情况。数据包丢失将阻止接收方注意到已更改的ISF,从而用以前的ISF而不是新的ISF隐藏丢失的传输帧。尽管这种错误在以后总是可以检测到,但它会导致帧边界不对齐,这可能会导致音频失真和同步问题,因为创建的音频样本太多或太少。在大多数情况下,如第4.5.1节所述,在隐藏之前执行ISF恢复可以缓解此问题。

4.5.1. ISF Recovery in Case of Packet Loss
4.5.1. 包丢失情况下的ISF恢复

In case of packet loss, it is important that the AMR-WB+ decoder initiates a proper error concealment to replace the frames carried in the lost packet. A loss concealment algorithm requires a codec framing that matches the timestamps of the correctly received frames. Hence, it is necessary to recover the timestamps of the lost frames. Doing so is non-trivial because the codec frame length that is associated with the ISF may have changed during the frame loss.

在分组丢失的情况下,重要的是AMR-WB+解码器启动适当的错误隐藏以替换丢失分组中携带的帧。丢失隐藏算法需要与正确接收的帧的时间戳匹配的编解码器帧。因此,有必要恢复丢失帧的时间戳。这样做非常重要,因为与ISF相关联的编解码器帧长度在帧丢失期间可能已更改。

In the following, the recovery of the timestamp information of lost frames is illustrated by the means of an example. Two frames with timestamps t0 and t1 have been received properly, the first one being the last packet before the loss, and the latter one being the first packet after the loss period. The ISF values for these packets are isf0 and isf1, respectively. The TFIs of these frames are tfi0 and tfi1, respectively. The associated frame lengths (in timestamp ticks) are given as L0 and L1, respectively. In this example three frames with timestamps x1 - x3 have been lost. The example further assumes that ISF changes once from isf0 to isf1 during the frame loss period, as shown in the figure below.

在下文中,通过示例说明丢失帧的时间戳信息的恢复。已正确接收到具有时间戳t0和t1的两个帧,第一个帧是丢失前的最后一个分组,后一个帧是丢失周期后的第一个分组。这些数据包的ISF值分别为isf0和isf1。这些帧的TFI分别为tfi0和tfi1。相关帧长度(以时间戳标记为单位)分别表示为L0和L1。在本例中,丢失了三个时间戳为x1-x3的帧。该示例进一步假设在帧丢失期间,ISF从isf0更改为isf1一次,如下图所示。

Since not all information required for the full recovery of the timestamps is generally known in the receiver, an algorithm is needed to estimate the ISF associated with the lost frames. Also, the number of lost frames needs to be recovered.

由于在接收机中并非完全恢复时间戳所需的所有信息通常是已知的,因此需要一种算法来估计与丢失帧相关联的ISF。此外,需要恢复丢失帧的数量。

     |<---L0--->|<---L0--->|<-L1->|<-L1->|<-L1->|
        
     |<---L0--->|<---L0--->|<-L1->|<-L1->|<-L1->|
        
     |   Rxd    |   lost   | lost | lost |  Rxd |
   --+----------+----------+------+------+------+--
        
     |   Rxd    |   lost   | lost | lost |  Rxd |
   --+----------+----------+------+------+------+--
        

t0 x1 x2 x3 t1

t0 x1 x2 x3 t1

Example Algorithm:

算法示例:

   Start:                              # check for frame loss
   If (t0 + L0) == t1 Then goto End    # no frame loss
        
   Start:                              # check for frame loss
   If (t0 + L0) == t1 Then goto End    # no frame loss
        
   Step 1:                             # check case with no ISF change
   If (isf0 != isf1) Then goto Step 2  # At least one ISF change
   If (isFractional(t1 - t0)/L0) Then goto Step 3
                                       # More than 1 ISF change
        
   Step 1:                             # check case with no ISF change
   If (isf0 != isf1) Then goto Step 2  # At least one ISF change
   If (isFractional(t1 - t0)/L0) Then goto Step 3
                                       # More than 1 ISF change
        
   Return recovered timestamps as
   x(n) = t0 + n*L1 and associated ISF equal to isf0,
   for 0 < n < (t1 - t0)/L0
   goto End
        
   Return recovered timestamps as
   x(n) = t0 + n*L1 and associated ISF equal to isf0,
   for 0 < n < (t1 - t0)/L0
   goto End
        
   Step 2:
   Loop initialization: n := 4 - tfi0 mod 4
   While n <= (t1-t0)/L0
     Evaluate m := (t1 - t0 - n*L0)/L1
     If (isInteger(m) AND ((tfi0+n+m) mod 4 == tfi1)) Then goto found;
     n := n+4
     endloop
   goto step 3                         # More than 1 ISF change
        
   Step 2:
   Loop initialization: n := 4 - tfi0 mod 4
   While n <= (t1-t0)/L0
     Evaluate m := (t1 - t0 - n*L0)/L1
     If (isInteger(m) AND ((tfi0+n+m) mod 4 == tfi1)) Then goto found;
     n := n+4
     endloop
   goto step 3                         # More than 1 ISF change
        
   found:
   Return recovered timestamps and ISFs as
   x(i) = t0 + i*L0 and associated ISF equal to isf0, for 0 < i <= n
   x(i) = t0 + n*L0 + (i-n)*L1 and associated ISF equal to isf1,
   for n < i <= n+m
   goto End
        
   found:
   Return recovered timestamps and ISFs as
   x(i) = t0 + i*L0 and associated ISF equal to isf0, for 0 < i <= n
   x(i) = t0 + n*L0 + (i-n)*L1 and associated ISF equal to isf1,
   for n < i <= n+m
   goto End
        

Step 3: More than 1 ISF change has occurred. Since ISF changes can be assumed to be infrequent, such a situation occurs only if long sequences of frames are lost. In that case it is probably not useful to try to recover the timestamps of the lost frames. Rather, the AMR-WB+ decoder should be reset, and decoding should be resumed starting with the frame with timestamp t1.

步骤3:发生了1个以上的ISF更改。由于可以假设ISF更改不频繁,因此只有在长帧序列丢失时才会出现这种情况。在这种情况下,尝试恢复丢失帧的时间戳可能没有用处。相反,应该重置AMR-WB+解码器,并且应该从具有时间戳t1的帧开始恢复解码。

End:

完:

The above algorithm still does not solve the issue when the receiver buffer depth is shallower than the loss burst. In this kind of case, where the concealment must be done without any knowledge about future frames, the concealment may result in loss of frame boundary alignment. If that occurs, it may be necessary to reset and restart the codec to perform resynchronization.

当接收机缓冲区深度小于丢失突发时,上述算法仍然不能解决该问题。在这种情况下,如果必须在不了解未来帧的情况下进行隐藏,则隐藏可能导致帧边界对齐丢失。如果出现这种情况,可能需要重置并重新启动编解码器以执行重新同步。

4.5.2. Decoding Validation
4.5.2. 解码验证

If the receiver finds a mismatch between the size of a received payload and the size indicated by the ToC of the payload, the receiver SHOULD discard the packet. This is recommended because decoding a frame parsed from a payload based on erroneous ToC data could severely degrade the audio quality.

如果接收器发现接收到的有效载荷的大小与有效载荷的ToC指示的大小不匹配,则接收器应丢弃该数据包。建议这样做,因为根据错误的ToC数据对从有效负载解析的帧进行解码可能会严重降低音频质量。

5. Congestion Control
5. 拥塞控制

The general congestion control considerations for transporting RTP data apply; see RTP [3] and any applicable RTP profile like AVP [9]. However, the multi-rate capability of AMR-WB+ audio coding provides a mechanism that may help to control congestion, since the bandwidth demand can be adjusted (within the limits of the codec) by selecting a different coding frame type or lower internal sampling rate.

传输RTP数据的一般拥塞控制注意事项适用;参见RTP[3]和任何适用的RTP配置文件,如AVP[9]。然而,AMR-WB+音频编码的多速率能力提供了一种可能有助于控制拥塞的机制,因为可以通过选择不同的编码帧类型或更低的内部采样率来调整带宽需求(在编解码器的限制范围内)。

The number of frames encapsulated in each RTP payload highly influences the overall bandwidth of the RTP stream due to header overhead constraints. Packetizing more frames in each RTP payload can reduce the number of packets sent and hence the header overhead, at the expense of increased delay and reduced error robustness.

由于报头开销限制,封装在每个RTP有效负载中的帧的数量高度影响RTP流的总体带宽。在每个RTP有效负载中打包更多帧可以减少发送的数据包数量,从而减少报头开销,但代价是增加延迟和降低错误鲁棒性。

If forward error correction (FEC) is used, the amount of FEC-induced redundancy needs to be regulated such that the use of FEC itself does not cause a congestion problem.

如果使用前向纠错(FEC),则需要调节FEC引起的冗余量,以便FEC本身的使用不会导致拥塞问题。

6. Security Considerations
6. 安全考虑

RTP packets using the payload format defined in this specification are subject to the general security considerations discussed in RTP [3] and any applicable profile such as AVP [9] or SAVP [10]. As this format transports encoded audio, the main security issues include confidentiality, integrity protection, and data origin authentication of the audio itself. The payload format itself does not have any built-in security mechanisms. Any suitable external mechanisms, such as SRTP [10], MAY be used.

使用本规范中定义的有效负载格式的RTP数据包受RTP[3]和任何适用配置文件(如AVP[9]或SAVP[10])中讨论的一般安全注意事项的约束。由于这种格式传输编码音频,主要的安全问题包括机密性、完整性保护和音频本身的数据源身份验证。有效负载格式本身没有任何内置的安全机制。可以使用任何合适的外部机制,如SRTP[10]。

This payload format and the AMR-WB+ decoder do not exhibit any significant non-uniformity in the receiver-side computational complexity for packet processing, and thus are unlikely to pose a denial-of-service threat due to the receipt of pathological data.

该有效载荷格式和AMR-WB+解码器在用于分组处理的接收器端计算复杂度方面没有表现出任何显著的非均匀性,因此不太可能由于接收病理数据而造成拒绝服务威胁。

6.1. Confidentiality
6.1. 保密性

In order to ensure confidentiality of the encoded audio, all audio data bits MUST be encrypted. There is less need to encrypt the payload header or the table of contents since they only carry information about the frame type. This information could also be useful to a third party, for example, for quality monitoring.

为了确保编码音频的机密性,必须对所有音频数据位进行加密。不需要加密有效负载头或目录,因为它们只携带有关帧类型的信息。这些信息也可能对第三方有用,例如用于质量监控。

The use of interleaving in conjunction with encryption can have a negative impact on confidentiality, for a short period of time. Consider the following packets (in brackets) containing frame numbers as indicated: {10, 14, 18}, {13, 17, 21}, {16, 20, 24} (a popular continuous diagonal interleaving pattern). The originator wishes to deny some participants the ability to hear material starting at time 16. Simply changing the key on the packet with the timestamp at or after 16, and denying that new key to those participants, does not achieve this; frames 17, 18, and 21 have been supplied in prior packets under the prior key, and error concealment may make the audio intelligible at least as far as frame 18 or 19, and possibly further.

在短时间内,将交织与加密结合使用会对保密性产生负面影响。考虑下面的数据包(括号内)包含框编号,如{ 10, 14, 18 },{ 13, 17, 21 },{ 16, 20, 24 }(一种流行的连续对角交错模式)。发起者希望剥夺一些参与者从时间16开始聆听材料的能力。简单地更改时间戳为16或16之后的数据包上的密钥,并拒绝向这些参与者提供新密钥,并不能实现这一点;帧17、18和21已在先前密钥下的先前分组中提供,并且错误隐藏可使音频至少远至帧18或19,并且可能远至帧18或19。

6.2. Authentication and Integrity
6.2. 身份验证和完整性

To authenticate the sender of the speech, an external mechanism MUST be used. It is RECOMMENDED that such a mechanism protects both the complete RTP header and the payload (speech and data bits).

要对语音发送者进行身份验证,必须使用外部机制。建议这种机制同时保护完整的RTP报头和有效负载(语音和数据位)。

Data tampering by a man-in-the-middle attacker could replace audio content and also result in erroneous depacketization/decoding that could lower the audio quality.

中间人攻击者篡改数据可能会替换音频内容,还可能导致错误的解包/解码,从而降低音频质量。

7. Payload Format Parameters
7. 有效载荷格式参数

This section defines the parameters that may be used to select features of the AMR-WB+ payload format. The parameters are defined as part of the media type registration for the AMR-WB+ audio codec. A mapping of the parameters into the Session Description Protocol (SDP) [6] is also provided for those applications that use SDP. Equivalent parameters could be defined elsewhere for use with control protocols that do not use MIME or SDP.

本节定义了可用于选择AMR-WB+有效载荷格式特征的参数。这些参数定义为AMR-WB+音频编解码器的媒体类型注册的一部分。还为使用SDP的应用程序提供了参数到会话描述协议(SDP)[6]的映射。可以在其他地方定义等效参数,以便与不使用MIME或SDP的控制协议一起使用。

The data format and parameters are only specified for real-time transport in RTP.

数据格式和参数仅为RTP中的实时传输指定。

7.1. Media Type Registration
7.1. 媒体类型注册

The media type for the Extended Adaptive Multi-Rate Wideband (AMR-WB+) codec is allocated from the IETF tree, since AMR-WB+ is expected to be a widely used audio codec in general streaming applications.

扩展自适应多速率宽带(AMR-WB+)编解码器的媒体类型是从IETF树中分配的,因为AMR-WB+有望成为一般流媒体应用中广泛使用的音频编解码器。

Note: Parameters not listed below MUST be ignored by the receiver.

注:接收器必须忽略以下未列出的参数。

Media Type name: audio

媒体类型名称:音频

Media subtype name: AMR-WB+

媒体子类型名称:AMR-WB+

Required parameters:

所需参数:

None

没有一个

Optional parameters:

可选参数:

channels: The maximum number of audio channels used by the audio frames. Permissible values are 1 (mono) or 2 (stereo). If no parameter is present, the maximum number of channels is 2 (stereo). Note: When set to 1, implicitly the stereo frame types cannot be used.

通道:音频帧使用的最大音频通道数。允许值为1(单声道)或2(立体声)。如果没有参数,则通道的最大数量为2(立体声)。注意:如果设置为1,则无法隐式使用立体声帧类型。

interleaving: Indicates that interleaved mode SHALL be used for the payload. The parameter specifies the number of transport frame slots required in a deinterleaving buffer (including the frame that is ready to be consumed). Its value is equal to one plus the maximum number of frames that precede any frame in transmission order and follow the frame in RTP timestamp order. The value MUST be greater than zero. If this parameter is not present, interleaved mode SHALL NOT be used.

交织:表示有效负载应使用交织模式。该参数指定解交织缓冲区(包括准备使用的帧)中所需的传输帧插槽数。其值等于1加上传输顺序中任何帧之前和RTP时间戳顺序中帧之后的最大帧数。该值必须大于零。如果该参数不存在,则不应使用交织模式。

int-delay: The minimal media time delay in RTP timestamp ticks that is needed in the deinterleaving buffer, i.e., the difference in RTP timestamp ticks between the earliest and latest audio frame present in the deinterleaving buffer.

int delay:解交织缓冲区中所需的最小媒体时间延迟(以RTP时间戳标记为单位),即解交织缓冲区中最早和最新音频帧之间的RTP时间戳标记差异。

ptime: See Section 6 in RFC 2327 [6].

ptime:参见RFC 2327[6]中的第6节。

maxptime: See Section 8 in RFC 3267 [7].

最大时间:见RFC 3267[7]第8节。

Restriction on Usage: This type is only defined for transfer via RTP (STD 64).

使用限制:此类型仅为通过RTP(STD 64)传输而定义。

Encoding considerations: An RTP payload according to this format is binary data and thus may need to be appropriately encoded in non-binary environments. However, as long as used within RTP, no encoding is necessary.

编码注意事项:根据此格式的RTP有效负载是二进制数据,因此可能需要在非二进制环境中进行适当编码。然而,只要在RTP中使用,就不需要编码。

Security considerations: See Section 6 of RFC 4352.

安全注意事项:见RFC 4352第6节。

Interoperability considerations: To maintain interoperability with AMR-WB-capable end-points, in cases where negotiation is possible and the AMR-WB+ end-point supporting this format also supports RFC 3267 for AMR-WB transport, an AMR-WB+ end-point SHOULD declare itself also as AMR-WB capable (i.e., supporting also "audio/AMR-WB" as specified in RFC 3267).

互操作性注意事项:为了保持与支持AMR WB的端点的互操作性,在可能协商且支持此格式的AMR-WB+端点也支持RFC 3267用于AMR-WB传输的情况下,AMR-WB+端点应声明自己也支持AMR-WB(即,也支持“音频/AMR-WB”)如RFC 3267中规定)。

As the AMR-WB+ decoder is capable of performing stereo to mono conversions, all receivers of AMR-WB+ should be able to receive both stereo and mono, although the receiver is only capable of playout of mono signals.

由于AMR-WB+解码器能够执行立体声到单声道的转换,AMR-WB+的所有接收机都应该能够接收立体声和单声道,尽管接收机只能播放单声道信号。

Public specification: RFC 4352 3GPP TS 26.290, see reference [1] of RFC 4352

公共规范:RFC 4352 3GPP TS 26.290,参见RFC 4352的参考文献[1]

Additional information: This MIME type is not applicable for file storage. Instead, file storage of AMR-WB+ encoded audio is specified within the 3GPP-defined ISO-based multimedia file format defined in 3GPP TS 26.244; see reference [14] of RFC 4352. This file format has the MIME types "audio/3GPP" or "video/3GPP" as defined by RFC 3839 [15].

附加信息:此MIME类型不适用于文件存储。相反,AMR-WB+编码音频的文件存储在3GPP TS 26.244中定义的3GPP定义的基于ISO的多媒体文件格式中指定;参见RFC 4352的参考文献[14]。该文件格式具有RFC 3839[15]定义的MIME类型“音频/3GPP”或“视频/3GPP”。

Person & email address to contact for further information: magnus.westerlund@ericsson.com ari.lakaniemi@nokia.com

联系人和电子邮件地址,以获取更多信息:magnus。westerlund@ericsson.com阿里。lakaniemi@nokia.com

Intended usage: COMMON. It is expected that many IP-based streaming applications will use this type.

预期用途:普通。预计许多基于IP的流媒体应用程序将使用这种类型。

Change controller: IETF Audio/Video Transport working group delegated from the IESG.

变更控制员:IESG授权的IETF音频/视频传输工作组。

7.2. Mapping Media Type Parameters into SDP
7.2. 将媒体类型参数映射到SDP

The information carried in the media type specification has a specific mapping to fields in the Session Description Protocol (SDP) [6], which is commonly used to describe RTP sessions. When SDP is used to specify an RTP session using this RTP payload format, the mapping is as follows:

媒体类型规范中包含的信息与会话描述协议(SDP)[6]中的字段具有特定映射,该协议通常用于描述RTP会话。当使用SDP使用此RTP有效负载格式指定RTP会话时,映射如下所示:

- The media type ("audio") is used in SDP "m=" as the media name.

- 媒体类型(“音频”)在SDP“m=”中用作媒体名称。

- The media type (payload format name) is used in SDP "a=rtpmap" as the encoding name. The RTP clock rate in "a=rtpmap" SHALL be 72000 for AMR-WB+, and the encoding parameter number of channels MUST either be explicitly set to 1 or 2, or be omitted, implying the default value of 2.

- 媒体类型(有效负载格式名称)在SDP“a=rtpmap”中用作编码名称。对于AMR-WB+,“a=rtpmap”中的RTP时钟频率应为72000,信道编码参数编号必须明确设置为1或2,或者省略,这意味着默认值为2。

- The parameters "ptime" and "maxptime" are placed in the SDP attributes "a=ptime" and "a=maxptime", respectively.

- 参数“ptime”和“maxptime”分别放置在SDP属性“a=ptime”和“a=maxptime”中。

- Any remaining parameters are placed in the SDP "a=fmtp" attribute by copying them directly from the MIME media type string as a semicolon-separated list of parameter=value pairs.

- 通过直接从MIME媒体类型字符串中以分号分隔的参数=值对列表形式复制其余参数,将其放置在SDP“a=fmtp”属性中。

7.2.1. Offer-Answer Model Considerations
7.2.1. 提供答案模型注意事项

To achieve good interoperability in an Offer-Answer [8] negotiation usage, the following considerations should be taken into account:

为了在要约-答复[8]协商使用中实现良好的互操作性,应考虑以下因素:

For negotiable offer/answer usage the following interpretation rules SHALL be applied:

For negotiable offer/answer usage the following interpretation rules SHALL be applied:translate error, please retry

- The "interleaving" parameter is symmetric, thus requiring that the answerer must also include it for the answer to an offered payload type that contains the parameter. However, the buffer space value is declarative in usage in unicast. For multicast usage, the same value in the response is required in order to accept the payload type. For streams declared as sendrecv or recvonly: The receiver will accept reception of streams using the interleaved mode of the payload format. The value declares the amount of buffer space the receiver has available for the sender to utilize. For sendonly streams, the parameter indicates the desired configuration and amount of buffer space. An answerer is RECOMMENDED to respond using the offered value, if capable of using it.

- “交错”参数是对称的,因此要求应答者还必须将其包含在包含该参数的提供的有效负载类型的答案中。但是,在单播中,缓冲区空间值是声明性的。对于多播使用,响应中需要相同的值才能接受有效负载类型。对于声明为sendrecv或RecVoOnly的流:接收器将使用有效负载格式的交织模式接受流的接收。该值声明接收方可供发送方使用的缓冲区空间量。对于仅发送的流,该参数表示所需的配置和缓冲区空间量。如果能够使用,建议回答者使用提供的值进行回答。

- The "int-delay" parameter is declarative. For streams declared as sendrecv or recvonly, the value indicates the maximum initial delay the receiver will accept in the deinterleaving buffer. For sendonly streams, the value is the amount of media time the sender desires to use. The value SHOULD be copied into any response.

- “int delay”参数是声明性的。对于声明为sendrecv或RecvoOnly的流,该值表示接收器将在解交织缓冲区中接受的最大初始延迟。对于sendonly流,该值是发送方希望使用的媒体时间量。应将该值复制到任何响应中。

- The "channels" parameter is declarative. For "sendonly" streams, it indicates the desired channel usage, stereo and mono, or mono only. For "recvonly" and "sendrecv" streams, the parameter indicates what the receiver accepts to use. As any receiver will be capable of receiving stereo frame type and perform local mixing within the AMR-WB+ decoder, there is normally only one reason to restrict to mono only: to avoid spending bit-rate on data that are not utilized if the front-end is only capable of mono.

- “channels”参数是声明性的。对于“sendonly”流,它表示所需的频道使用情况,立体声和单声道,或仅单声道。对于“RecVoOnly”和“sendrecv”流,该参数指示接收器接受使用的内容。由于任何接收器都能够接收立体声帧类型,并在AMR-WB+解码器内执行本地混音,因此通常只有一个原因限制为仅使用单声道:避免在前端仅能够使用单声道的情况下,在未使用的数据上花费比特率。

- The "ptime" parameter works as indicated by the offer/answer model [8]; "maxptime" SHALL be used in the same way.

- “ptime”参数按照报价/应答模型[8]的指示工作;“maxptime”应以相同方式使用。

- To maintain interoperability with AMR-WB in cases where negotiation is possible, an AMR-WB+ capable end-point that also implements the AMR-WB payload format [7] is RECOMMENDED to declare itself capable of AMR-WB as it is a subset of the AMR-WB+ codec.

- 为了在可能协商的情况下保持与AMR-WB的互操作性,建议同时实现AMR-WB有效载荷格式[7]的支持AMR-WB+的端点声明自己能够使用AMR-WB,因为它是AMR-WB+编解码器的子集。

In declarative usage, like SDP in RTSP [16] or SAP [17], the following interpretation of the parameters SHALL be done:

在声明性用法中,如RTSP[16]或SAP[17]中的SDP,应对参数进行以下解释:

- The "interleaving" parameter, if present, configures the payload format in that mode, and the value indicates the number of frames that the deinterleaving buffer is required to support to be able to handle this session correctly.

- “交错”参数(如果存在)在该模式下配置有效负载格式,该值表示需要解交错缓冲区支持的帧数,以便能够正确处理该会话。

- The "int-delay" parameter indicates the initial buffering delay required to receive this stream correctly.

- “int delay”参数表示正确接收此流所需的初始缓冲延迟。

- The "channels" parameter indicates if the content being transmitted can contain either both stereo and mono rates, or only mono.

- “channels”(频道)参数表示正在传输的内容是否可以同时包含立体声和单声道,或者仅包含单声道。

- All other parameters indicate values that are being used by the sending entity.

- 所有其他参数表示发送实体正在使用的值。

7.2.2. Examples
7.2.2. 例子

One example of an SDP session description utilizing AMR-WB+ mono and stereo encoding follows.

下面是使用AMR-WB+单声道和立体声编码的SDP会话描述的一个示例。

    m=audio 49120 RTP/AVP 99
    a=rtpmap:99 AMR-WB+/72000/2
    a=fmtp:99 interleaving=30; int-delay=86400
    a=maxptime:100
        
    m=audio 49120 RTP/AVP 99
    a=rtpmap:99 AMR-WB+/72000/2
    a=fmtp:99 interleaving=30; int-delay=86400
    a=maxptime:100
        

Note that the payload format (encoding) names are commonly shown in uppercase. Media subtypes are commonly shown in lowercase. These names are case-insensitive in both places. Similarly, parameter names are case-insensitive both in MIME types and in the default mapping to the SDP a=fmtp attribute.

请注意,有效负载格式(编码)名称通常以大写形式显示。媒体子类型通常以小写字母显示。这些名称在两个位置都不区分大小写。类似地,参数名在MIME类型和到SDP a=fmtp属性的默认映射中都不区分大小写。

8. IANA Considerations
8. IANA考虑

The IANA has registered one new MIME subtype (audio/amr-wb+); see Section 7.

IANA注册了一个新的MIME子类型(音频/amr wb+;见第7节。

9. Contributors
9. 贡献者

Daniel Enstrom has contributed in writing the codec introduction section. Stefan Bruhn has contributed by writing the ISF recovery algorithm.

Daniel Enstrom为编写编解码器介绍部分做出了贡献。Stefan Bruhn通过编写ISF恢复算法做出了贡献。

10. Acknowledgements
10. 致谢

The authors would like to thank Redwan Salami and Stefan Bruhn for their significant contributions made throughout the writing and reviewing of this document. Dave Singer contributed by reviewing and suggesting improved language. Anisse Taleb and Ingemar Johansson contributed by implementing the payload format and thus helped locate some flaws. We would also like to acknowledge Qiaobing Xie, coauthor of RFC 3267, on which this document is based.

作者要感谢Redwan Salami和Stefan Bruhn在本文件编写和审查过程中做出的重大贡献。戴夫·辛格(Dave Singer)通过评论和建议改进语言做出了贡献。Anise Taleb和Ingemar Johansson通过实现有效负载格式做出了贡献,从而帮助定位了一些缺陷。我们还要感谢RFC 3267的合著者谢乔冰,本文件的基础是他。

11. References
11. 工具书类
11.1. Normative References
11.1. 规范性引用文件

[1] 3GPP TS 26.290 "Audio codec processing functions; Extended Adaptive Multi-Rate Wideband (AMR-WB+) codec; Transcoding functions", version 6.3.0 (2005-06), 3rd Generation Partnership Project (3GPP).

[1] 3GPP TS 26.290“音频编解码器处理功能;扩展自适应多速率宽带(AMR-WB+)编解码器;转码功能”,版本6.3.0(2005-06),第三代合作伙伴项目(3GPP)。

[2] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[2] Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。

[3] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003.

[3] Schulzrinne,H.,Casner,S.,Frederick,R.,和V.Jacobson,“RTP:实时应用的传输协议”,STD 64,RFC 35502003年7月。

[4] 3GPP TS 26.192 "AMR Wideband speech codec; Comfort Noise aspects", version 6.0.0 (2004-12), 3rd Generation Partnership Project (3GPP).

[4] 3GPP TS 26.192“AMR宽带语音编解码器;舒适噪声方面”,版本6.0.0(2004-12),第三代合作伙伴项目(3GPP)。

[5] 3GPP TS 26.193 "AMR Wideband speech codec; Source Controlled Rate operation", version 6.0.0 (2004-12), 3rd Generation Partnership Project (3GPP).

[5] 3GPP TS 26.193“AMR宽带语音编解码器;源代码控制速率操作”,版本6.0.0(2004-12),第三代合作伙伴计划(3GPP)。

[6] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998.

[6] Handley,M.和V.Jacobson,“SDP:会话描述协议”,RFC 2327,1998年4月。

[7] Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, "Real-Time Transport Protocol (RTP) Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs", RFC 3267, June 2002.

[7] Sjoberg,J.,Westerlund,M.,Lakaniemi,A.,和Q.Xie,“自适应多速率(AMR)和自适应多速率宽带(AMR-WB)音频编解码器的实时传输协议(RTP)有效载荷格式和文件存储格式”,RFC 3267,2002年6月。

[8] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002.

[8] Rosenberg,J.和H.Schulzrinne,“具有会话描述协议(SDP)的提供/应答模型”,RFC 3264,2002年6月。

[9] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video Conferences with Minimal Control", STD 65, RFC 3551, July 2003.

[9] Schulzrinne,H.和S.Casner,“具有最小控制的音频和视频会议的RTP配置文件”,STD 65,RFC 3551,2003年7月。

11.2. Informative References
11.2. 资料性引用

[10] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004.

[10] Baugher,M.,McGrew,D.,Naslund,M.,Carrara,E.,和K.Norrman,“安全实时传输协议(SRTP)”,RFC 37112004年3月。

[11] Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for Generic Forward Error Correction", RFC 2733, December 1999.

[11] Rosenberg,J.和H.Schulzrinne,“通用前向纠错的RTP有效载荷格式”,RFC 2733,1999年12月。

[12] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse-Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, September 1997.

[12] 帕金斯,C.,库维拉斯,I.,霍德森,O.,哈德曼,V.,汉德利,M.,博洛特,J.,维加·加西亚,A.,和S.福斯·帕里斯,“冗余音频数据的RTP有效载荷”,RFC 21981997年9月。

[13] 3GPP TS 26.233 "Packet Switched Streaming service", version 5.7.0 (2005-03), 3rd Generation Partnership Project (3GPP).

[13] 3GPP TS 26.233“分组交换流媒体服务”,版本5.7.0(2005-03),第三代合作伙伴项目(3GPP)。

[14] 3GPP TS 26.244 "Transparent end-to-end packet switched streaming service (PSS); 3GPP file format (3GP)", version 6.4.0 (2005-09), 3rd Generation Partnership Project (3GPP).

[14] 3GPP TS 26.244“透明端到端分组交换流媒体服务(PSS);3GPP文件格式(3GP)”,版本6.4.0(2005-09),第三代合作伙伴计划(3GPP)。

[15] Castagno, R. and D. Singer, "MIME Type Registrations for 3rd Generation Partnership Project (3GPP) Multimedia files", RFC 3839, July 2004.

[15] Castagno,R.和D.Singer,“第三代合作伙伴项目(3GPP)多媒体文件的MIME类型注册”,RFC 3839,2004年7月。

[16] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming Protocol (RTSP)", RFC 2326, April 1998.

[16] Schulzrinne,H.,Rao,A.,和R.Lanphier,“实时流协议(RTSP)”,RFC2326,1998年4月。

[17] Handley, M., Perkins, C., and E. Whelan, "Session Announcement Protocol", RFC 2974, October 2000.

[17] Handley,M.,Perkins,C.,和E.Whelan,“会话公告协议”,RFC 29742000年10月。

[18] 3GPP TS 26.140 "Multimedia Messaging Service (MMS); Media formats and codes", version 6.2.0 (2005-03), 3rd Generation Partnership Project (3GPP).

[18] 3GPP TS 26.140“彩信服务(MMS);媒体格式和代码”,版本6.2.0(2005-03),第三代合作伙伴计划(3GPP)。

[19] 3GPP TS 26.140 "Multimedia Broadcast/Multicast Service (MBMS); Protocols and codecs", version 6.3.0 (2005-12), 3rd Generation Partnership Project (3GPP).

[19] 3GPP TS 26.140“多媒体广播/多播服务(MBMS);协议和编解码器”,版本6.3.0(2005-12),第三代合作伙伴计划(3GPP)。

Any 3GPP document can be downloaded from the 3GPP webserver, "http://www.3gpp.org/", see specifications.

任何3GPP文档都可以从3GPP Web服务器下载,“http://www.3gpp.org/“,请参阅规格。

Authors' Addresses

作者地址

Johan Sjoberg Ericsson Research Ericsson AB SE-164 80 Stockholm SWEDEN

约翰斯约伯格爱立信研究公司爱立信AB SE-16480瑞典斯德哥尔摩

   Phone: +46 8 7190000
   EMail: Johan.Sjoberg@ericsson.com
        
   Phone: +46 8 7190000
   EMail: Johan.Sjoberg@ericsson.com
        

Magnus Westerlund Ericsson Research Ericsson AB SE-164 80 Stockholm SWEDEN

Magnus Westerlund Ericsson Research Ericsson AB SE-164 80瑞典斯德哥尔摩

   Phone: +46 8 7190000
   EMail: Magnus.Westerlund@ericsson.com
        
   Phone: +46 8 7190000
   EMail: Magnus.Westerlund@ericsson.com
        

Ari Lakaniemi Nokia Research Center P.O. Box 407 FIN-00045 Nokia Group FINLAND

Ari Lakaniemi诺基亚研究中心邮政信箱407 FIN-00045诺基亚芬兰集团

   Phone: +358-71-8008000
   EMail: ari.lakaniemi@nokia.com
        
   Phone: +358-71-8008000
   EMail: ari.lakaniemi@nokia.com
        

Stephan Wenger Nokia Corporation P.O. Box 100 FIN-33721 Tampere FINLAND

斯蒂芬·温格诺基亚公司芬兰坦佩雷100 FIN-33721邮政信箱

   Phone: +358-50-486-0637
   EMail: Stephan.Wenger@nokia.com
        
   Phone: +358-50-486-0637
   EMail: Stephan.Wenger@nokia.com
        

Full Copyright Statement

完整版权声明

Copyright (C) The Internet Society (2006).

版权所有(C)互联网协会(2006年)。

This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.

本文件受BCP 78中包含的权利、许可和限制的约束,除其中规定外,作者保留其所有权利。

This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

本文件及其包含的信息是按“原样”提供的,贡献者、他/她所代表或赞助的组织(如有)、互联网协会和互联网工程任务组不承担任何明示或暗示的担保,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。

Intellectual Property

知识产权

The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.

IETF对可能声称与本文件所述技术的实施或使用有关的任何知识产权或其他权利的有效性或范围,或此类权利下的任何许可可能或可能不可用的程度,不采取任何立场;它也不表示它已作出任何独立努力来确定任何此类权利。有关RFC文件中权利的程序信息,请参见BCP 78和BCP 79。

Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.

向IETF秘书处披露的知识产权副本和任何许可证保证,或本规范实施者或用户试图获得使用此类专有权利的一般许可证或许可的结果,可从IETF在线知识产权存储库获取,网址为http://www.ietf.org/ipr.

The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.

IETF邀请任何相关方提请其注意任何版权、专利或专利申请,或其他可能涵盖实施本标准所需技术的专有权利。请将信息发送至IETF的IETF-ipr@ietf.org.

Acknowledgement

确认

Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA).

RFC编辑器功能的资金由IETF行政支持活动(IASA)提供。