Network Working Group                                            B. Link
Request for Comments: 4598                            Dolby Laboratories
Category: Standards Track                                      July 2006
Network Working Group                                            B. Link
Request for Comments: 4598                            Dolby Laboratories
Category: Standards Track                                      July 2006

Real-time Transport Protocol (RTP) Payload Format for Enhanced AC-3 (E-AC-3) Audio


Status of This Memo


This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.

本文件规定了互联网社区的互联网标准跟踪协议,并要求进行讨论和提出改进建议。有关本协议的标准化状态和状态,请参考当前版本的“互联网官方协议标准”(STD 1)。本备忘录的分发不受限制。

Copyright Notice


Copyright (C) The Internet Society (2006).




This document describes a Real-time Transport Protocol (RTP) payload format for transporting Enhanced AC-3 (E-AC-3) encoded audio data. E-AC-3 is a high-quality, multichannel audio coding format and is an extension of the AC-3 audio coding format, which is used in US High-Definition Television (HDTV), DVD, cable and satellite television, and other media. E-AC-3 is an optional audio format in US and world wide digital television and high-definition DVD formats. The RTP payload format as presented in this document includes support for data fragmentation.


Table of Contents


   1. Introduction ....................................................2
   2. Overview of Enhanced-AC-3 .......................................3
      2.1. E-AC-3 Bit Stream ..........................................5
           2.1.1. Sync Frames and Audio Blocks ........................5
           2.1.2. Programs and Substreams .............................6
           2.1.3. Frame Sets ..........................................7
   3. RTP E-AC-3 Header Fields ........................................7
   4. RTP E-AC-3 Payload Format .......................................8
      4.1. Payload Specific Header ....................................8
      4.2. Fragmentation of E-AC-3 Frames .............................9
      4.3. Concatenation of E-AC-3 Frames .............................9
      4.4. Carriage of AC-3 Frames ...................................10
   5. Types and Names ................................................10
      5.1. Media Type Registration ...................................10
      5.2. SDP Usage .................................................13
   6. Security Considerations ........................................14
   7. Congestion Control .............................................15
   8. IANA Considerations ............................................15
   9. References .....................................................15
      9.1. Normative References ......................................15
      9.2. Informative References ....................................16
   1. Introduction ....................................................2
   2. Overview of Enhanced-AC-3 .......................................3
      2.1. E-AC-3 Bit Stream ..........................................5
           2.1.1. Sync Frames and Audio Blocks ........................5
           2.1.2. Programs and Substreams .............................6
           2.1.3. Frame Sets ..........................................7
   3. RTP E-AC-3 Header Fields ........................................7
   4. RTP E-AC-3 Payload Format .......................................8
      4.1. Payload Specific Header ....................................8
      4.2. Fragmentation of E-AC-3 Frames .............................9
      4.3. Concatenation of E-AC-3 Frames .............................9
      4.4. Carriage of AC-3 Frames ...................................10
   5. Types and Names ................................................10
      5.1. Media Type Registration ...................................10
      5.2. SDP Usage .................................................13
   6. Security Considerations ........................................14
   7. Congestion Control .............................................15
   8. IANA Considerations ............................................15
   9. References .....................................................15
      9.1. Normative References ......................................15
      9.2. Informative References ....................................16
1. Introduction
1. 介绍

The Enhanced AC-3 (E-AC-3) [ETSI] audio coding system is built on a foundation of AC-3. It is an enhancement and extension to AC-3, which is an existing audio coding standard commonly used for DVD, broadcast, cable, and satellite television content. E-AC-3 is designed to enable operation at both higher and lower data rates than AC-3, provide expanded channel configurations, and provide greater flexibility for carriage of multiple audio program elements. The relationship between E-AC-3 and AC-3 provides for low-loss, low-cost conversion between the two and makes E-AC-3 especially suitable in applications that require compatibility with the existing broadcast-reception and audio/video decoding infrastructure. Dolby Digital Plus is a branded version of Enhanced AC-3.

增强AC-3(E-AC-3)[ETSI ]音频编码系统是在AC-3的基础上建立的。它是AC-3的增强和扩展,AC-3是一种现有的音频编码标准,常用于DVD、广播、有线电视和卫星电视内容。E-AC-3设计用于以比AC-3更高和更低的数据速率运行,提供扩展的频道配置,并为多个音频节目元素的传输提供更大的灵活性。E-AC-3和AC-3之间的关系提供了两者之间的低损耗、低成本转换,并使E-AC-3特别适用于需要与现有广播接收和音频/视频解码基础设施兼容的应用。杜比Digital Plus是增强型AC-3的品牌版。

E-AC-3 has been standardized within both the European Telecommunications Standards Institute (ETSI) and the Advanced Television Systems Committee (ATSC). It is an optional audio format for use in US (ATSC) and Digital Video Broadcasting (DVB) television transmission. It is also a required audio format for use in the High Definition (HD)-DVD optical-storage media format and included in the Blu-ray Disc format.


There is a need to stream E-AC-3 content over IP networks. E-AC-3 is primarily used in audio-for-video applications, so RTP serves well as a transport solution with its mechanism for synchronizing streams. Applications for streaming E-AC-3 include Internet Protocol television (IPTV), video on demand, interactive features of next generation DVD formats, and transfer of movies across a home network.


Section 2 gives a brief overview of the E-AC-3 algorithm. Section 3 specifies values for fields in the RTP header, and Section 4 specifies the E-AC-3 payload format, itself. Section 5 discusses media types and Session Description Protocol (SDP) usage. Security considerations are covered in Section 6, congestion control in Section 7, and IANA considerations in Section 8.


The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].


2. Overview of Enhanced-AC-3
2. 增强型AC-3概述

Enhanced AC-3 (E-AC-3) is a frequency-domain perceptual audio coding system. Time blocks of an audio signal are converted from the time domain to the frequency domain by a transform (the Modified Discrete Cosine Transform (MDCT)) so that a model of the human auditory perceptual system can be applied. In this domain, quantization noise can be constrained to specific frequency regions. The perceptual model predicts in which frequency regions the auditory system will be least able to detect the quantization noise from data rate reduction. A more detailed technical description of E-AC-3 can be found in [2004AES].


E-AC-3 is built upon a foundation of AC-3. More background on AC-3 can be found in the AC-3 specification [ETSI], a technical paper [1994AES], and the AC-3 RTP payload format [RFC4184]. The frame structure and meta-data of AC-3 are maintained. E-AC-3 content is not directly compatible with AC-3 decoders, but it can be converted to the AC-3 format to provide compatibility with existing decoders. Because AC-3 is the foundation of E-AC-3, conversion between the two formats can be done in a way that minimizes the degradations associated with tandem coding. In addition, the computational cost of the conversion is reduced compared to a full decode and re-encode.

E-AC-3是建立在AC-3的基础上的。有关AC-3的更多背景信息,请参见AC-3规范[ETSI]、技术论文[1994AES]和AC-3 RTP有效载荷格式[RFC4184]。维护AC-3的框架结构和元数据。E-AC-3内容与AC-3解码器不直接兼容,但可以转换为AC-3格式,以提供与现有解码器的兼容性。因为AC-3是E-AC-3的基础,所以两种格式之间的转换可以以最小化串列编码相关的劣化的方式来完成。此外,与完全解码和重新编码相比,转换的计算成本降低。

E-AC-3 exploits psychoacoustic phenomena that cause a significant fraction of the information contained in a typical audio signal to be inaudible. Substantial data reduction occurs via the removal of inaudible information contained in an audio stream. Source coding techniques are further used to reduce the data rate.


Like most perceptual coders, E-AC-3 operates in the frequency domain. A 512-point MDCT transform is taken with 50% overlap, providing 256 new frequency samples. Frequency samples are then converted to exponents and mantissas. Exponents are differentially encoded. Mantissas are allocated a varying number of bits depending on the audibility of the spectral components associated with them. Audibility is determined via a masking curve. Bits for mantissas are allocated from a global bit pool.


E-AC-3 adds new coding tools, such as a longer filter bank, vector quantization, and spectral extension, to provide greater data efficiency and to operate at lower data rates than AC-3. In the other direction, an expanded bit stream syntax and new frame constraints permit operation at higher data rates than AC-3. The E-AC-3 syntax also allows a larger number of audio channels in one bit stream. E-AC-3 operates at data rates from 32 kbps to 6.144 Mbps and at three sampling rates: 32 kHz, 44.1 kHz, and 48 kHz.

E-AC-3增加了新的编码工具,如更长的滤波器组、矢量量化和频谱扩展,以提供比AC-3更高的数据效率和更低的数据速率。在另一个方向,扩展的比特流语法和新的帧约束允许以比AC-3更高的数据速率进行操作。E-AC-3语法还允许在一个比特流中有更多的音频通道。E-AC-3以32 kbps至6.144 Mbps的数据速率和32 kHz、44.1 kHz和48 kHz三种采样速率运行。

E-AC-3 supports the carriage of multiple programs and the carriage of programs with more than a baseline of 5.1 audio channels. Both of these extensions beyond AC-3 are accomplished by time multiplexing additional data with baseline data. In the case of multiple programs, frames with data for the programs are interleaved. In the case of more than 5.1 channels, frames from substreams carrying the extra channels are interleaved with the independent substream that carries a 5.1-channel compatible mix. Both of these forms of multiplexing can occur in the same bit stream. In other words, mixing multiple programs, some or all with more than 5.1 channels, is permitted.


Additional channel capacity is enabled by adding substreams to a program. One primary substream, called the "independent substream", is required for each program. This substream carries a self-contained mix of the audio, using a maximum of 5.1 channels, which makes its channel configuration compatible with AC-3. Then, additional, optional substreams are used in the program to carry additional channels. The data for each additional channel carries an indication of whether that channel provides data for an additional speaker location or replacement data for one of the speaker locations already defined by a previous substream. For example, one common 7.1-channel format uses three front channels and four surround channels. It is packaged with a primary substream, which contains a 5.1-channel downmix of the 7.1-channel content, using left, center, right, left surround, right surround, and low-frequency effects channels. One dependent substream supplies four channels: replacements for left surround and right surround, along with two additional surround channels (left back and right back).


The specification for E-AC-3 [ETSI] requires that all E-AC-3 decoders be capable of decoding at least a baseline portion of any E-AC-3 bit stream, which consists of the first independent substream of the first program, and of ignoring the other elements of the bit stream. This baseline is limited to 5.1 channels, and a system is also able to convert to configurations with fewer channels for a presentation that matches its output capabilities, if needed. More capable decoders can optionally choose among and mix multiple programs, and also decode configurations with more channels than the baseline by decoding dependent substreams.


2.1. E-AC-3 Bit Stream
2.1. E-AC-3比特流
2.1.1. Sync Frames and Audio Blocks
2.1.1. 同步帧和音频块

The basic organizational building block in an E-AC-3 bit stream is the sync frame (also called a frame in this document). A sync frame contains the data necessary to decode time domain audio samples for one or more channels over a time of one or more audio blocks, so a frame is an Application Data Unit (ADU). Each E-AC-3 frame contains a Sync Information (SI) field, a Bit Stream Information (BSI) field, an Audio Frame (AF) field, and up to six audio blocks (ABs). Each AB represents 256 Pulse Code Modulation (PCM) samples for each channel. The frame ends with an optional auxiliary data field (AUX) and an error correction field (CRC). Figure 1 shows the structure of an E-AC-3 frame, where N is the number of blocks in the frame.


           +---+---+---+---------+- ... -+---------+---+---+
           |SI |BSI|AF |  AB(0)  |  ...  |  AB(N)  |AUX|CRC|
           +---+---+---+---------+- ... -+---------+---+---+
           +---+---+---+---------+- ... -+---------+---+---+
           |SI |BSI|AF |  AB(0)  |  ...  |  AB(N)  |AUX|CRC|
           +---+---+---+---------+- ... -+---------+---+---+

Figure 1. E-AC-3 frame format with more than one block


The SI field contains information needed to acquire and maintain codec synchronization. The BSI field contains parameters that describe the coded audio service. It carries an indication of the size of the frame in 16-bit words ('frmsiz', Section E.1.3 of [ETSI]) and an indication of the sampling rate ('fscod'). It also carries an indication of the number of blocks in the frame ('numblkscod'); permitted values are one, two, three, or six blocks. The AF field contains information about coding tools that applies to the entire frame. Each block has a duration of 256 samples, so a frame's duration is the corresponding multiple of 256 samples. The time duration of the frame is also dependent on the sampling rate, as shown in Table 1.


Table 1. Time duration of E-AC-3 frame (number of blocks vs. sampling rate)


   | blocks per frame | 32 kHz |        44.1 kHz |          48 kHz |
   |                1 |   8 ms |  approx. 5.8 ms |  approx. 5.3 ms |
   |                2 |  16 ms | approx. 11.6 ms | approx. 10.7 ms |
   |                3 |  24 ms | approx. 17.4 ms |           16 ms |
   |                6 |  48 ms | approx. 34.8 ms |           32 ms |
   | blocks per frame | 32 kHz |        44.1 kHz |          48 kHz |
   |                1 |   8 ms |  approx. 5.8 ms |  approx. 5.3 ms |
   |                2 |  16 ms | approx. 11.6 ms | approx. 10.7 ms |
   |                3 |  24 ms | approx. 17.4 ms |           16 ms |
   |                6 |  48 ms | approx. 34.8 ms |           32 ms |

Each audio block contains header fields that indicate the use of various coding tools: block switching, dither, coupling, spectral extension, and exponent strategy. They also contain metadata, optionally used to enhance playback, such as dynamic range control. Finally, the exponents and bit allocation data needed to decode the mantissas into audio data, and the mantissas themselves, are included. The format of audio blocks is described in detail in [ETSI].


2.1.2. Programs and Substreams
2.1.2. 程序和子流

An E-AC-3 bit stream is logically arranged into programs. A bit stream contains one or more programs, up to a maximum of eight. When multiple programs are present in a bit stream, the frames that constitute them are interleaved in time.


     +----------+-     -+----------+----------+-     -+----------+-
     |Program(1)|  ...  |Program(N)|Program(1)|  ...  |Program(N)| ...
     | Frame 0  |       | Frame 0  | Frame 1  |       | Frame 1  |
     +----------+-     -+----------+----------+-     -+----------+-
     +----------+-     -+----------+----------+-     -+----------+-
     |Program(1)|  ...  |Program(N)|Program(1)|  ...  |Program(N)| ...
     | Frame 0  |       | Frame 0  | Frame 1  |       | Frame 1  |
     +----------+-     -+----------+----------+-     -+----------+-

Figure 2. Interleaving of multiple programs in an E-AC-3 bit stream


Each program contains one independent substream and optionally contains up to eight dependent substreams. The independent substream carries a soundtrack of up to 5.1 channels, the multichannel format that matches the capabilities of AC-3, and can be meaningfully decoded and presented without any of the associated dependent substreams. The dependent substreams are used to provide alternate channel data that enable different channel configurations, for example, to increase the number of channels beyond 5.1. A frame of a dependent substream can be decoded by itself, but its content can only be meaningfully presented in conjunction with the corresponding independent substream. The type and identity of the substream to which a frame belongs can be determined from parameters in the frame's BSI (strmtyp and substreamid, in Section E.1.3.1 of [ETSI]).


When a program contains more than one substream, the frames belonging to those substreams are interleaved in time, and taken together, the frames of a program that correspond to the same time period are called a 'program set'. Figure 3 shows the interleaving of substreams for a single program.


     / --------- program set for frame 0 ------- \
     :                                           :
   +-------------+-------------+-   -+-------------+-------------+-
   |  Program(1) |  Program(1) |     |  Program(1) |  Program(1) |
   | Independent |  Dependent  | ... |  Dependent  | Independent | ...
   |  Substream  | Substream(0)|     | Substream(n)|  Substream  |
   |   Frame 0   |   Frame 0   |     |   Frame 0   |   Frame 1   |
   +-------------+-------------+-   -+-------------+-------------+-
     / --------- program set for frame 0 ------- \
     :                                           :
   +-------------+-------------+-   -+-------------+-------------+-
   |  Program(1) |  Program(1) |     |  Program(1) |  Program(1) |
   | Independent |  Dependent  | ... |  Dependent  | Independent | ...
   |  Substream  | Substream(0)|     | Substream(n)|  Substream  |
   |   Frame 0   |   Frame 0   |     |   Frame 0   |   Frame 1   |
   +-------------+-------------+-   -+-------------+-------------+-

Figure 3. Interleaving of multiple substreams in an E-AC-3 program


2.1.3. Frame Sets
2.1.3. 帧集

A further logical organization of the E-AC-3 bit stream is applied to facilitate conversion of E-AC-3 bit streams to AC-3 bit streams. In this organization, the frames carrying six consecutive audio blocks are treated as a group, called a 'frame set', regardless of the number of frames needed to carry six audio blocks. This grouping extends across all programs and substreams that cover the time period of the six blocks. Since E-AC-3 frames may carry one, two, three, or six blocks, a frame set will consist of six, three, two, or one frames. AC-3 frames always carry six blocks, so the frame set provides framing synchronization between an E-AC-3 bit stream and an AC-3 bit stream. Metadata that indicates the alignment is carried in the first frame (which will be part of an independent substream) of each frame set in an E-AC-3 stream. This first frame can be identified by a parameter in the BSI field of the bit stream: the Converter Synchronization flag (convsync, in Section E. of [ETSI]) is set to true (1).


3. RTP E-AC-3 Header Fields
3. RTP E-AC-3头字段

The RTP header is defined in the RTP specification [RFC3550]. This section defines how a number of fields in the header are used.


o Payload Type (PT): The assignment of an RTP payload type for this packet format is outside the scope of this document; it is specified by the RTP profile under which this payload format is used, or signaled dynamically out-of-band (e.g., using SDP).

o 有效负载类型(PT):此数据包格式的RTP有效负载类型的分配不在本文档的范围内;它由使用此有效负载格式的RTP配置文件指定,或在带外动态发送信号(例如,使用SDP)。

o Marker (M) bit: The M bit is set to one to indicate that the RTP packet payload contains at least one complete E-AC-3 frame or contains the final fragment of an E-AC-3 frame.

o 标记(M)位:将M位设置为1,以指示RTP分组有效载荷包含至少一个完整的E-AC-3帧或包含E-AC-3帧的最终片段。

o Extension (X) bit: Defined by the RTP profile used.

o 扩展(X)位:由使用的RTP配置文件定义。

o Timestamp: A 32-bit word that corresponds to the sampling instant for the first E-AC-3 frame in the RTP packet. Packets containing fragments of the same frame MUST have the same timestamp. The timestamp of the first RTP packet sent SHOULD be selected at random; thereafter, it increases linearly according to the number of samples included in each frame. Note that the number of samples in a frame depends on the number of blocks in the frame, with 256 samples in each block. Also note that more than one frame might correspond to the same time period when multiple channel configurations or programs are present. If these frames occupy multiple packets, it is possible that the resulting packets will have the same timestamp value.

o 时间戳:与RTP数据包中第一个E-AC-3帧的采样瞬间相对应的32位字。包含相同帧片段的数据包必须具有相同的时间戳。应随机选择发送的第一个RTP分组的时间戳;此后,它根据每个帧中包含的采样数线性增加。请注意,帧中的采样数取决于帧中的块数,每个块中有256个采样。还请注意,当存在多个信道配置或程序时,多个帧可能对应于同一时间段。如果这些帧占用多个数据包,则结果数据包可能具有相同的时间戳值。

4. RTP E-AC-3 Payload Format
4. RTP E-AC-3有效载荷格式

This payload format is defined for E-AC-3, as defined in Annex E of [ETSI]. Note that E-AC-3 decoders are required to be capable of decoding AC-3 bit streams, so a receiver capable of receiving the E-AC-3 payload format defined in this document MUST also receive the payload format for AC-3 defined in [RFC4184].


According to [RFC2736], RTP payload formats should contain an integral number of application data units (ADUs). The E-AC-3 frame corresponds to an ADU in the context of this payload format. Each RTP payload MUST start with the two-byte payload specific header followed by an integral number of complete E-AC-3 frames, or a single fragment of an E-AC-3 frame.


If an E-AC-3 frame exceeds the MTU for a network, it SHOULD be fragmented for transmission within an RTP packet. Section 4.2 provides guidelines for creating frame fragments.


4.1. Payload Specific Header
4.1. 有效载荷特定头

There is a two-octet Payload header at the beginning of each payload. Each E-AC-3 RTP payload MUST begin with the following Payload header.

在每个有效载荷的开头有一个两个八位字节的有效载荷头。每个E-AC-3 RTP有效载荷必须以以下有效载荷头开始。

                 0                   1
                 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
                |    MBZ      |F|       NF      |
                 0                   1
                 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
                |    MBZ      |F|       NF      |

Figure 4. E-AC-3 RTP Payload header

图4。E-AC-3 RTP有效载荷头

o Must Be Zero (MBZ): Bits marked MBZ SHALL be set to the value zero and SHALL be ignored by receivers. The bits are reserved for future extensions.

o 必须为零(MBZ):标记为MBZ的位应设置为零,且接收器应忽略。这些位是为将来的扩展保留的。

o Frame Type (F): This one-bit field indicates the type of frame(s) present in the payload. It takes the following values: 0 - One or more complete frames. 1 - Fragment of frame. (Note that the M bit in the RTP header is set for the final fragment.)

o 帧类型(F):此一位字段指示有效负载中存在的帧类型。它采用以下值:0-一个或多个完整帧。1-框架碎片。(请注意,RTP头中的M位是为最终片段设置的。)

o Number of frames/fragments (NF): An 8-bit field whose meaning depends on the Frame Type (F) in this payload. For complete frames (F of 0), it is used to indicate the number of E-AC-3 frames in the RTP payload. For frame fragments (F of 1), it is used to indicate the number of fragments (and therefore packets) that make up the current frame. NF MUST be identical for packets containing fragments of the same frame.

o 帧/片段数(NF):一个8位字段,其含义取决于此有效负载中的帧类型(F)。对于完整帧(F/0),它用于指示RTP有效负载中E-AC-3帧的数量。对于帧片段(F/1),它用于指示组成当前帧的片段(以及数据包)的数量。对于包含相同帧片段的数据包,NF必须相同。

When receiving E-AC-3 payloads with F = 0 and more than a single frame (NF > 1), a receiver needs to use the "frmsiz" field in the BSI header in each E-AC-3 frame to determine the frame's length if the receiver needs to determine the boundary of the next frame. Note that the frame length varies from frame to frame in some circumstances.


4.2. Fragmentation of E-AC-3 Frames
4.2. E-AC-3帧的分段

The size of an E-AC-3 frame is signaled in the Frame Size (frmsiz) field in a frame's BSI header. The value of this field is one less than the number of 16-bit words in the frame. If the size of an E-AC-3 frame exceeds the MTU size, the frame SHOULD be fragmented at the RTP level. The fragmentation MAY be performed at any byte boundary in the frame. RTP packets containing fragments of the same E-AC-3 frame SHALL be sent in consecutive order, from first to last fragment. This enables a receiver to assemble the fragments in the correct order.


4.3. Concatenation of E-AC-3 Frames
4.3. E-AC-3帧的级联

There are cases where E-AC-3 frame sizes are smaller than the MTU size and it is advantageous to include multiple frames in a packet.


It is useful to take into account the logical arrangement of the bit stream into program sets and frame sets to constrain the effects of the loss of a packet. It is desirable for a complete program set or a complete frame set to be included in one packet. Also, it is undesirable for frames from more than one program set or frame set to be in the same packet, unless the sets are complete. In this way, the loss of a packet is kept from causing the contents of another packet to be unusable.


Frames from more than one program set SHOULD NOT be included in the same packet unless all program sets in the packet are complete. Frames from more than one frame set SHOULD NOT be included in the same packet unless all frame sets in the packet are complete.


4.4. Carriage of AC-3 Frames
4.4. AC-3机架的运输

The E-AC-3 specification [ETSI] requires that E-AC-3 decoders be capable of decoding AC-3 frames. That specification also supports carriage of AC-3 frames in an E-AC-3 bit stream. Due to differences between E-AC-3 and AC-3 frames, there are restrictions placed on the use of AC-3 frames: they are only used for the independent substream of the first (or only) program in an E-AC-3 bit stream. Note that carriage of only E-AC-3 frames, only AC-3 frames, and a mixture of E-AC-3 and AC-3 frames are all legal configurations. It is legal to change among the configurations in a bit stream. The AC-3 frame format is described in [RFC4184] and specified in [ETSI].


5. Types and Names
5. 类型和名称
5.1. Media Type Registration
5.1. 媒体类型注册

This registration uses the template defined in [RFC4288] and follows [RFC3555].


   Subject: Registration of media type audio/eac3
   Subject: Registration of media type audio/eac3

Type name: audio


Subtype name: eac3


Required parameter:


o rate: The RTP timestamp clock rate that is equal to the audio sampling rate. Permitted rates are 32000, 44100, and 48000.

o 速率:RTP时间戳时钟速率,等于音频采样速率。允许的费率为32000、44100和48000。

Optional parameter:


o bitStreamConfig: The configuration of programs and substreams in the bit stream, expressed as a sequence of ASCII characters. This parameter can serve two purposes. First, during the creation of a session, the bitStreamConfig parameter might be used to negotiate a match between the requirements of a bit stream and the capabilities of a receiver to avoid using network bandwidth for data that cannot be used. Second, it makes the configuration of the bit stream explicit to the receiver so that whenever a packet is lost, the receiver can identify which kind of frame(s) has been lost to aid error mitigation.

o bitStreamConfig:位流中程序和子流的配置,表示为ASCII字符序列。此参数可用于两个目的。首先,在会话创建过程中,可以使用bitStreamConfig参数协商比特流的要求与接收器的能力之间的匹配,以避免将网络带宽用于无法使用的数据。其次,它使比特流的配置对接收器明确,以便每当数据包丢失时,接收器可以识别丢失了哪种类型的帧以帮助错误缓解。

The format for the value for this parameter is to represent each substream of the bit stream by a single character indicating its type, immediately followed by the number of audio channels resulting if a frame of that substream (plus any other required substreams) is decoded. Note that even though Low-Frequency Effects (LFE) channels are often described as "fractional" channels (e.g., the ".1" in 5.1), for this parameter, an LFE channel is counted as one (e.g., a 5.1-channel configuration is indicated as 6). The configuration of the bit stream MUST match the value of this parameter for the duration of the session.


Allowed values for the substream type are as follows:


i - Independent substream. d - Dependent substream.


The E-AC-3 specification [ETSI] defines which configurations of bit streams are legal, which constrains the values the bitStreamConfig parameter will take. Each program starts with, and contains exactly one, independent substream ('i'). Each independent substream is followed by between 0 and 8 dependent substreams ('d'), which belong to the same program. See Section 2.1.2 for more discussion of programs and substreams.


For example, consider a bit stream containing two programs:


* the first program with

* 第一个节目

+ a six-channel independent substream + a dependent substream containing the additional channels needed for eight channels + a second dependent substream containing the further channels needed for 14 channels

+ 六通道独立子流+包含八个通道所需附加通道的从属子流+包含14个通道所需附加通道的第二个从属子流

* along with a second program with

* 还有第二个节目

+ another six-channel independent substream + a dependent substream containing the additional channels needed for eight channels

+ 另一个六通道独立子流+包含八个通道所需额外通道的从属子流

Then the configuration of the bit stream is indicated as follows:


      bitStreamConfig = i6d8d14i6d8
      bitStreamConfig = i6d8d14i6d8

When the bitStreamConfig parameter is being used in an offer/answer exchange, zero (0) for the number of channels for a substream in an answer is used to indicate a substream that the answerer desires not to receive.


Encoding considerations:


This media type is framed and contains binary data.


Security considerations:


See Section 6 of RFC 4598.

见RFC 4598第6节。

Interoperability considerations:


To maintain interoperability with AC-3-capable end-points, in cases where negotiation is possible, an E-AC-3 end-point SHOULD declare itself also as AC-3 capable (i.e., supporting also "audio/ac3" as specified in RFC 4184 [RFC4184]). Note that all E-AC-3 end-points are required to be AC-3 capable.

为保持与支持AC-3的端点的互操作性,在可能协商的情况下,E-AC-3端点应声明自己也支持AC-3(即,还支持RFC 4184[RFC4184]中规定的“音频/ac3”)。请注意,所有E-AC-3端点都需要具备AC-3功能。

Published specification:


RFC 4598 and ETSI TS 102.366 [ETSI].

RFC 4598和ETSI TS 102.366[ETSI]。

Applications that use this media type:


Multichannel audio compression of audio, and audio for video.


Additional information:


Magic number(s): The first two octets of an E-AC-3 frame are always the synchronization word, which has the hex value 0x0B77.


Person & email address to contact for further information:


Brian Link <> IETF AVT working group.

布莱恩·林克<>IETF AVT工作组。

Intended usage:




Restrictions on usage:


This media type depends on RTP framing, and hence is only defined for transfer via RTP [RFC3550]. Transport within other framing protocols is not defined at this time.


Author/Change controller:


IETF Audio/Video Transport Working Group delegated from the IESG.


5.2. SDP Usage
5.2. SDP使用

The information carried in the media type specification has a specific mapping to fields in the Session Description Protocol (SDP) [RFC2327], which is commonly used to describe RTP sessions. When SDP is used to specify sessions employing E-AC-3, the mapping is as follows:


o The Media type ("audio") goes in SDP "m=" as the media name.

o 媒体类型(“音频”)以SDP“m=”作为媒体名称。

o The Media subtype ("eac3") goes in SDP "a=rtpmap" as the encoding name.

o 媒体子类型(“eac3”)以SDP“a=rtpmap”作为编码名称。

o The required parameter "rate" also goes in "a=rtpmap" as the clock rate. (The optional "channels" rtpmap encoding parameter is not used. Instead, the information is included in the optional parameter bitStreamConfig.)

o 所需的参数“rate”也作为时钟频率进入“a=rtpmap”。(未使用可选的“通道”rtpmap编码参数。相反,该信息包含在可选参数bitStreamConfig中。)

o The optional parameter "bitStreamConfig" goes in the SDP "a=fmtp" attribute.

o 可选参数“bitStreamConfig”位于SDP“a=fmtp”属性中。

The following is an example of the SDP data for E-AC-3:


         m=audio 49111 RTP/AVP 100
         a=rtpmap:100 eac3/48000
         a=fmtp:100 bitStreamConfig i6d8d14i6d8
         m=audio 49111 RTP/AVP 100
         a=rtpmap:100 eac3/48000
         a=fmtp:100 bitStreamConfig i6d8d14i6d8

Certain considerations are needed when SDP is used to perform offer/answer exchanges [RFC3264].


o The "rate" is a symmetric parameter, and the answer MUST use the same value or the answerer removes the payload type.

o “速率”是一个对称参数,答案必须使用相同的值,否则应答者将删除有效负载类型。

o The "bitStreamConfig" parameter is declarative and indicates, for sendonly, the intended arrangement of substreams in the bit stream, along with the channel configuration, to transmit, and for recvonly or sendrecv, the desired bit stream arrangement and channel configuration to receive. The format of the bitStreamConfig value in an answer MAY differ from the offer value by replacing the number of channels for any undesired substreams with '0'. It is valid to zero out dependent substreams containing undesired channel configurations and to zero out all the substreams of an undesired program. Then the sender MAY reoffer the stream in the receiver's preferred configuration if it is capable of providing that configuration. Note that all receivers are capable of receiving, and all decoders are capable of decoding, any of the legal bit stream configurations, so the parameter exchange is not needed for interoperability. The parameter exchange might be used to help optimize the transmission to the number of programs or channels the receiver requests.

o “bitStreamConfig”参数是声明性的,对于sendonly,它指示位流中的子流的预期排列以及要传输的信道配置,对于RecvoOnly或sendrecv,它指示要接收的所需位流排列和信道配置。答案中bitStreamConfig值的格式可能不同于提供值,方法是将任何不需要的子流的通道数替换为“0”。将包含不需要的通道配置的依赖子流归零和将不需要的程序的所有子流归零是有效的。然后,如果发送方能够提供该配置,则发送方可以在接收方的优选配置中重新提供该流。注意,所有接收机都能够接收任何合法比特流配置,并且所有解码器都能够解码,因此互操作性不需要参数交换。参数交换可用于帮助优化对接收器请求的节目或频道数量的传输。

o Since an AC-3 bit stream is a special case of an E-AC-3 bit stream, it is permissible for an AC-3 bit stream to be carried in the E-AC-3 payload format. To ensure interoperability with receivers that support the AC-3 payload format but not the E-AC-3 payload format, a sender that desires to send an AC-3 bit stream in the E-AC-3 payload format SHOULD also offer the session in the AC-3 payload format by including payload types for both media subtypes: 'ac3' and 'eac3'.

o 由于AC-3比特流是E-AC-3比特流的特例,因此允许以E-AC-3有效载荷格式携带AC-3比特流。为确保与支持AC-3有效负载格式但不支持E-AC-3有效负载格式的接收机的互操作性,希望以E-AC-3有效负载格式发送AC-3比特流的发送方还应通过包括媒体子类型“ac3”和“eac3”的有效负载类型,以AC-3有效负载格式提供会话。

6. Security Considerations
6. 安全考虑

The payload format described in this document is subject to the security considerations defined in RTP [RFC3550] and in any applicable RTP profile (e.g., [RFC3551]). To protect the user's privacy and any copyrighted material, confidentiality protection would have to be applied. To also protect against modification by intermediate entities and ensure the authenticity of the stream, integrity protection and authentication would be required. Confidentiality, integrity protection, and authentication have to be solved by a mechanism external to this payload format, for example, Secure Real-time Transport Protocol (SRTP) [RFC3711].


The E-AC-3 format is designed so that the validity of data frames can be determined by decoders. The required decoder response to a malformed frame is to discard the malformed data and conceal the errors in the audio output until a valid frame is detected and decoded. This is expected to prevent crashes and other abnormal decoder behavior in response to errors or attacks.


7. Congestion Control
7. 拥塞控制

The general congestion control considerations for transporting RTP data apply to E-AC-3 audio over RTP as well; see RTP [RFC3550], and any applicable RTP profile (e.g., [RFC3551]).


E-AC-3 is a variable bit rate coding system so it is possible to use a variety of techniques to adapt to network bandwidth.


8. IANA Considerations
8. IANA考虑

The IANA has registered a new media subtype for E-AC-3 (see Section 5).


9. References
9. 工具书类
9.1. Normative References
9.1. 规范性引用文件

[ETSI] ETSI, "Digital Audio Compression (AC-3, Enhanced AC-3) Standard", TS 102 366, February 2005.

[ETSI]ETSI,“数字音频压缩(AC-3,增强型AC-3)标准”,TS 102 366,2005年2月。

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。

[RFC4184] Link, B., Hager, T., and J. Flaks, "RTP Payload Format for AC-3 Audio", RFC 4184, October 2005.

[RFC4184]Link,B.,Hager,T.,和J.Flaks,“AC-3音频的RTP有效载荷格式”,RFC 4184,2005年10月。

[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003.

[RFC3550]Schulzrinne,H.,Casner,S.,Frederick,R.,和V.Jacobson,“RTP:实时应用的传输协议”,STD 64,RFC 35502003年7月。

[RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and Registration Procedures", BCP 13, RFC 4288, December 2005.

[RFC4288]Freed,N.和J.Klensin,“介质类型规范和注册程序”,BCP 13,RFC 4288,2005年12月。

[RFC3555] Casner, S. and P. Hoschka, "MIME Type Registration of RTP Payload Formats", RFC 3555, July 2003.

[RFC3555]Casner,S.和P.Hoschka,“RTP有效载荷格式的MIME类型注册”,RFC 35552003年7月。

[RFC2327] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998.

[RFC2327]Handley,M.和V.Jacobson,“SDP:会话描述协议”,RFC 2327,1998年4月。

[RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002.

[RFC3264]Rosenberg,J.和H.Schulzrinne,“具有会话描述协议(SDP)的提供/应答模型”,RFC 3264,2002年6月。

9.2. Informative References
9.2. 资料性引用

[2004AES] Fielder, L., Andersen, R., Crockett, B., Davidson, G., Davis, M., Turner, S., Vinton, M., and P. Williams, "Introduction to Dolby Digital Plus, an Enhancement to the Dolby Digital Coding System", Preprint 6196, Presented at the 117th Convention of the Audio Engineering Society, October 2004.


[1994AES] Todd, C., Davidson, G., Davis, M., Fielder, L., Link, B., and S. Vernon, "AC-3: Flexible Perceptual Coding for Audio Transmission and Storage", Preprint 3796, Presented at the 96th Convention of the Audio Engineering Society, May 1994.


[RFC2736] Handley, M. and C. Perkins, "Guidelines for Writers of RTP Payload Format Specifications", BCP 36, RFC 2736, December 1999.

[RFC2736]Handley,M.和C.Perkins,“RTP有效载荷格式规范编写者指南”,BCP 36,RFC 2736,1999年12月。

[RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video Conferences with Minimal Control", STD 65, RFC 3551, July 2003.

[RFC3551]Schulzrinne,H.和S.Casner,“具有最小控制的音频和视频会议的RTP配置文件”,STD 65,RFC 3551,2003年7月。

[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004.

[RFC3711]Baugher,M.,McGrew,D.,Naslund,M.,Carrara,E.,和K.Norrman,“安全实时传输协议(SRTP)”,RFC 37112004年3月。

Author's Address


Brian Link Dolby Laboratories 100 Potrero Ave. San Francisco, CA 94103 US

Brian Link Dolby实验室100 PoTrro Av.旧金山,CA 94103美国

   Phone: +1 415 558 0200
   Phone: +1 415 558 0200

Full Copyright Statement


Copyright (C) The Internet Society (2006).


This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.

本文件受BCP 78中包含的权利、许可和限制的约束,除其中规定外,作者保留其所有权利。



Intellectual Property


The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.

IETF对可能声称与本文件所述技术的实施或使用有关的任何知识产权或其他权利的有效性或范围,或此类权利下的任何许可可能或可能不可用的程度,不采取任何立场;它也不表示它已作出任何独立努力来确定任何此类权利。有关RFC文件中权利的程序信息,请参见BCP 78和BCP 79。

Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at


The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at




Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA).