Network Working Group A. Li Request for Comments: 3558 UCLA Category: Standards Track July 2003
Network Working Group A. Li Request for Comments: 3558 UCLA Category: Standards Track July 2003
RTP Payload Format for Enhanced Variable Rate Codecs (EVRC) and Selectable Mode Vocoders (SMV)
增强型变速率编解码器(EVRC)和可选模式声码器(SMV)的RTP有效负载格式
Status of this Memo
本备忘录的状况
This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.
本文件规定了互联网社区的互联网标准跟踪协议,并要求进行讨论和提出改进建议。有关本协议的标准化状态和状态,请参考当前版本的“互联网官方协议标准”(STD 1)。本备忘录的分发不受限制。
Copyright Notice
版权公告
Copyright (C) The Internet Society (2003). All Rights Reserved.
版权所有(C)互联网协会(2003年)。版权所有。
Abstract
摘要
This document describes the RTP payload format for Enhanced Variable Rate Codec (EVRC) Speech and Selectable Mode Vocoder (SMV) Speech. Two sub-formats are specified for different application scenarios. A bundled/interleaved format is included to reduce the effect of packet loss on speech quality and amortize the overhead of the RTP header over more than one speech frame. A non-bundled format is also supported for conversational applications.
本文档描述了增强型变速率编解码器(EVRC)语音和可选模式声码器(SMV)语音的RTP有效负载格式。为不同的应用程序场景指定了两个子格式。包括捆绑/交织格式,以减少分组丢失对语音质量的影响,并在多个语音帧上分摊RTP报头的开销。会话应用程序也支持非捆绑格式。
Table of Contents
目录
1. Introduction ................................................... 2 2. Background ..................................................... 2 3. The Codecs Supported ........................................... 3 3.1. EVRC ...................................................... 3 3.2. SMV ....................................................... 3 3.3. Other Frame-Based Vocoders ................................ 4 4. RTP/Vocoder Packet Format ...................................... 4 4.1. Interleaved/Bundled Packet Format ......................... 5 4.2. Header-Free Packet Format ................................. 6 4.3. Determining the Format of Packets ......................... 7 5. Packet Table of Contents Entries and Codec Data Frame Format ... 7 5.1. Packet Table of Contents entries .......................... 7 5.2. Codec Data Frames ......................................... 8 6. Interleaving Codec Data Frames ................................. 9 7. Bundling Codec Data Frames .................................... 12 8. Handling Missing Codec Data Frames ............................ 12
1. Introduction ................................................... 2 2. Background ..................................................... 2 3. The Codecs Supported ........................................... 3 3.1. EVRC ...................................................... 3 3.2. SMV ....................................................... 3 3.3. Other Frame-Based Vocoders ................................ 4 4. RTP/Vocoder Packet Format ...................................... 4 4.1. Interleaved/Bundled Packet Format ......................... 5 4.2. Header-Free Packet Format ................................. 6 4.3. Determining the Format of Packets ......................... 7 5. Packet Table of Contents Entries and Codec Data Frame Format ... 7 5.1. Packet Table of Contents entries .......................... 7 5.2. Codec Data Frames ......................................... 8 6. Interleaving Codec Data Frames ................................. 9 7. Bundling Codec Data Frames .................................... 12 8. Handling Missing Codec Data Frames ............................ 12
9. Implementation Issues ......................................... 12 9.1. Interleaving Length .......................................12 9.2. Validation of Received Packets ............................13 9.3. Processing the Late Packets ...............................13 10. Mode Request ................................................. 13 11. Storage Format ............................................... 14 12. IANA Considerations .......................................... 15 12.1. Registration of Media Type EVRC ..........................15 12.2. Registration of Media Type EVRC0 .........................16 12.3. Registration of Media Type SMV ...........................17 12.4. Registration of Media Type SMV0 ..........................18 13. Mapping to SDP Parameters .................................... 19 14. Security Considerations ...................................... 20 15. Adding Support of Other Frame-Based Vocoders ................. 20 16. Acknowledgements ............................................. 21 17. References ................................................... 21 17.1 Normative ................................................ 21 17.2 Informative .............................................. 22 18. Author's Address ............................................. 22 19. Full Copyright Statement ..................................... 23
9. Implementation Issues ......................................... 12 9.1. Interleaving Length .......................................12 9.2. Validation of Received Packets ............................13 9.3. Processing the Late Packets ...............................13 10. Mode Request ................................................. 13 11. Storage Format ............................................... 14 12. IANA Considerations .......................................... 15 12.1. Registration of Media Type EVRC ..........................15 12.2. Registration of Media Type EVRC0 .........................16 12.3. Registration of Media Type SMV ...........................17 12.4. Registration of Media Type SMV0 ..........................18 13. Mapping to SDP Parameters .................................... 19 14. Security Considerations ...................................... 20 15. Adding Support of Other Frame-Based Vocoders ................. 20 16. Acknowledgements ............................................. 21 17. References ................................................... 21 17.1 Normative ................................................ 21 17.2 Informative .............................................. 22 18. Author's Address ............................................. 22 19. Full Copyright Statement ..................................... 23
This document describes how speech compressed with EVRC [1] or SMV [2] may be formatted for use as an RTP payload type. The format is also extensible to other codecs that generate a similar set of frame types. Two methods are provided to packetize the codec data frames into RTP packets: an interleaved/bundled format and a zero-header format. The sender may choose the best format for each application scenario, based on network conditions, bandwidth availability, delay requirements, and packet-loss tolerance.
本文档描述了如何格式化使用EVRC[1]或SMV[2]压缩的语音,以用作RTP有效负载类型。该格式还可扩展到生成类似帧类型集的其他编解码器。提供了两种将编解码器数据帧打包成RTP数据包的方法:交织/捆绑格式和零头格式。发送方可以根据网络条件、带宽可用性、延迟要求和丢包容忍度,为每个应用场景选择最佳格式。
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [3].
本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照RFC 2119[3]中所述进行解释。
The 3rd Generation Partnership Project 2 (3GPP2) has published two standards which define speech compression algorithms for CDMA applications: EVRC [1] and SMV [2]. EVRC is currently deployed in millions of first and second generation CDMA handsets. SMV is the preferred speech codec standard for CDMA2000, and will be deployed in third generation handsets in addition to EVRC. Improvements and new codecs will keep emerging as technology improves, and future handsets will likely support multiple codecs.
第三代合作伙伴项目2(3GPP2)发布了两个定义CDMA应用语音压缩算法的标准:EVRC[1]和SMV[2]。EVRC目前部署在数百万第一代和第二代CDMA手机中。SMV是CDMA2000的首选语音编解码器标准,除EVRC外,还将部署在第三代手机中。随着技术的进步,新的编解码器将不断涌现,未来的手机可能会支持多种编解码器。
The formats of the EVRC and SMV codec frames are very similar. Many other vocoders also share common characteristics, and have many similar application scenarios. This parallelism enables an RTP payload format to be designed for EVRC and SMV that may also support other, similar vocoders with minimal additional specification work. This can simplify the protocol for transporting vocoder data frames through RTP and reduce the complexity of implementations.
EVRC和SMV编解码器帧的格式非常相似。许多其他声码器也有共同的特点,并有许多类似的应用场景。这种并行性使得可以为EVRC和SMV设计一种RTP有效载荷格式,这种格式还可以支持其他类似的声码器,只需最少的额外规范工作。这可以简化通过RTP传输声码器数据帧的协议,并降低实现的复杂性。
The Enhanced Variable Rate Codec (EVRC) [1] compresses each 20 milliseconds of 8000 Hz, 16-bit sampled speech input into output frames in one of the three different sizes: Rate 1 (171 bits), Rate 1/2 (80 bits), or Rate 1/8 (16 bits). In addition, there are two zero bit codec frame types: null frames and erasure frames. Null frames are produced as a result of the vocoder running at rate 0. Null frames are zero bits long and are normally not transmitted. Erasure frames are the frames substituted by the receiver to the codec for the lost or damaged frames. Erasure frames are also zero bits long and are normally not transmitted.
增强型变速率编解码器(EVRC)[1]将8000 Hz的每20毫秒16位采样语音输入压缩到三种不同大小的输出帧中:速率1(171位)、速率1/2(80位)或速率1/8(16位)。此外,还有两种零位编解码器帧类型:空帧和擦除帧。零帧是声码器以0速率运行的结果。空帧的长度为零位,通常不传输。擦除帧是由接收机向编解码器替换丢失或损坏帧的帧。擦除帧的长度也为零位,通常不传输。
The codec chooses the output frame rate based on analysis of the input speech and the current operating mode (either normal or one of several reduced rate modes). For typical speech patterns, this results in an average output of 4.2 kilobits/second for normal mode and a lower average output for reduced rate modes.
编解码器根据对输入语音和当前操作模式(正常模式或几种降低速率模式之一)的分析来选择输出帧速率。对于典型的语音模式,正常模式的平均输出为4.2千比特/秒,而低速模式的平均输出较低。
The Selectable Mode Vocoder (SMV) [2] compresses each 20 milliseconds of 8000 Hz, 16-bit sampled speech input into output frames of one of the four different sizes: Rate 1 (171 bits), Rate 1/2 (80 bits), Rate 1/4 (40 bits), or Rate 1/8 (16 bits). In addition, there are two zero bit codec frame types: null frames and erasure frames. Null frames are produced as a result of the vocoder running at rate 0. Null frames are zero bits long and are normally not transmitted. Erasure frames are the frames substituted by the receiver to the codec for the lost or damaged frames. Erasure frames are also zero bits long and are normally not transmitted.
可选模式声码器(SMV)[2]将8000 Hz、16位采样语音输入的每20毫秒压缩为四种不同大小的输出帧之一:速率1(171位)、速率1/2(80位)、速率1/4(40位)或速率1/8(16位)。此外,还有两种零位编解码器帧类型:空帧和擦除帧。零帧是声码器以0速率运行的结果。空帧的长度为零位,通常不传输。擦除帧是由接收机向编解码器替换丢失或损坏帧的帧。擦除帧的长度也为零位,通常不传输。
The SMV codec can operate in six modes. Each mode may produce frames of any of the rates (full rate to 1/8 rate) for varying percentages of time, based on the characteristics of the speech samples and the selected mode. The SMV mode can change on a frame-by-frame basis. The SMV codec does not need additional information other than the codec data frames to correctly decode the
SMV编解码器可以在六种模式下工作。基于语音样本和所选模式的特征,每个模式可以针对不同的时间百分比产生任何速率(全速率到1/8速率)的帧。SMV模式可以逐帧更改。SMV编解码器不需要编解码器数据帧以外的其他信息来正确解码数据
data of various modes; therefore, the mode of the encoder does not need to be transmitted with the encoded frames.
各种模式的数据;因此,编码器的模式不需要与编码帧一起发送。
The SMV codec chooses the output frame rate based on analysis of the input speech and the current operating mode. For typical speech patterns, this results in an average output of 4.2 kilobits/second for Mode 0 in two way conversation (approximately 50% active speech time and 50% in eighth rate while listening) and lower for other reduced rate modes. SMV is more bandwidth efficient than EVRC. EVRC is equivalent in performance to SMV mode 1.
SMV编解码器根据对输入语音和当前工作模式的分析来选择输出帧速率。对于典型的语音模式,这导致双向对话模式0的平均输出为4.2千比特/秒(大约50%的活动语音时间和50%的第八速率听力),而其他低速模式的平均输出为更低。SMV比EVRC具有更高的带宽效率。EVRC在性能上等同于SMV模式1。
Other frame-based vocoders can be carried in the packet format defined in this document, as long as they possess the following properties:
其他基于帧的声码器可采用本文件中定义的数据包格式,只要它们具有以下特性:
o The codec is frame-based; o blank and erasure frames are supported; o the total number of rates is less than 17; o the maximum full rate frame can be transported in a single RTP packet using this specific format.
o 编解码器是基于帧的;o支持空白帧和擦除帧;o差饷总数少于17;o最大全速率帧可以使用此特定格式在单个RTP数据包中传输。
Vocoders with the characteristics listed above can be transported using the packet format specified in this document with some additional specification work; the pieces that must be defined are listed in Section 15.
具有上述特征的声码器可使用本文件规定的数据包格式进行传输,并进行一些额外的规范工作;第15节列出了必须定义的部分。
The vocoder speech data may be transmitted in either of the two RTP packet formats specified in the following two subsections, as appropriate for the application scenario. In the packet format diagrams shown in this document, bit 0 is the most significant bit.
声码器语音数据可以根据应用场景,以以下两小节中指定的两种RTP分组格式中的任一种来传输。在本文档所示的数据包格式图中,位0是最重要的位。
This format is used to send one or more vocoder frames per packet. Interleaving or bundling MAY be used. The RTP packet for this format is as follows:
此格式用于每个数据包发送一个或多个声码器帧。可以使用交错或捆绑。此格式的RTP数据包如下所示:
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RTP Header [4] | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ |R|R| LLL | NNN | MMM | Count | TOC | ... | TOC |padding| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | one or more codec data frames, one per TOC entry | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RTP Header [4] | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ |R|R| LLL | NNN | MMM | Count | TOC | ... | TOC |padding| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | one or more codec data frames, one per TOC entry | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The RTP header has the expected values as described in the RTP specification [4]. The RTP timestamp is in 1/8000 of a second units for EVRC and SMV. For any other vocoders that use this packet format, the timestamp unit needs to be defined explicitly. The M bit should be set as specified in the applicable RTP profile, for example, RFC 3551 [5]. Note that RFC 3551 [5] specifies that if the sender does not suppress silence, the M bit will always be zero. When multiple codec data frames are present in a single RTP packet, the timestamp is that of the oldest data represented in the RTP packet. The assignment of an RTP payload type for this packet format is outside the scope of this document; it is specified by the RTP profile under which this payload format is used.
RTP标头具有RTP规范[4]中所述的预期值。对于EVRC和SMV,RTP时间戳为1/8000秒单位。对于使用此数据包格式的任何其他声码器,需要明确定义时间戳单元。M位应按照适用RTP配置文件的规定进行设置,例如RFC 3551[5]。请注意,RFC 3551[5]规定,如果发送方不抑制静音,则M位将始终为零。当单个RTP数据包中存在多个编解码器数据帧时,时间戳是RTP数据包中表示的最早数据的时间戳。此数据包格式的RTP有效负载类型的分配不在本文件的范围内;它由使用此有效负载格式的RTP配置文件指定。
The first octet of a Interleaved/Bundled format packet is the Interleave Octet. The second octet contains the Mode Request and Frame Count fields. The Table of Contents (ToC) field then follows. The fields are specified as follows:
交织/捆绑格式分组的第一个八位组是交织八位组。第二个八位字节包含模式请求和帧计数字段。随后是目录(ToC)字段。字段指定如下:
Reserved (RR): 2 bits Reserved bits. MUST be set to zero by sender, SHOULD be ignored by receiver.
保留(RR):2位保留位。发送方必须设置为零,接收方应忽略。
Interleave Length (LLL): 3 bits Indicates the length of interleave; a value of 0 indicates bundling, a special case of interleaving. See Section 6 and Section 7 for more detailed discussion.
交织长度(LLL):3位表示交织长度;值为0表示绑定,这是一种特殊的交织情况。有关更详细的讨论,请参见第6节和第7节。
Interleave Index (NNN): 3 bits Indicates the index within an interleave group. MUST have a value less than or equal to the value of LLL. Values of NNN greater than the value of LLL are invalid. Packet with invalid NNN values SHOULD be ignored by the receiver.
交织索引(NNN):3位表示交织组内的索引。必须具有小于或等于LLL值的值。大于LLL值的NNN值无效。接收器应忽略具有无效NNN值的数据包。
Mode Request (MMM): 3 bits The Mode Request field is used to signal Mode Request information. See Section 10 for details.
模式请求(MMM):3位模式请求字段用于发送模式请求信息。详见第10节。
Frame Count (Count): 5 bits The number of ToC fields (and vocoder frames) present in the packet is the value of the frame count field plus one. A value of zero indicates that the packet contains one ToC field, while a value of 31 indicates that the packet contains 32 ToC fields.
帧计数(Count):5位数据包中存在的ToC字段(和声码器帧)的数量是帧计数字段的值加上一。值为零表示数据包包含一个ToC字段,而值为31表示数据包包含32个ToC字段。
Padding (padding): 0 or 4 bits This padding ensures that codec data frames start on an octet boundary. When the frame count is odd, the sender MUST add 4 bits of padding following the last TOC. When the frame count is even, the sender MUST NOT add padding bits. If padding is present, the padding bits MUST be set to zero by sender, and SHOULD be ignored by receiver.
填充(Padding):0或4位此填充确保编解码器数据帧从八位字节边界开始。当帧计数为奇数时,发送方必须在最后一个TOC之后添加4位填充。当帧计数为偶数时,发送方不得添加填充位。如果存在填充,则发送方必须将填充位设置为零,接收方应忽略填充位。
The Table of Contents field (ToC) provides information on the codec data frame(s) in the packet. There is one ToC entry for each codec data frame. The detailed formats of the ToC field and codec data frames are specified in Section 5.
目录字段(ToC)提供有关数据包中编解码器数据帧的信息。每个编解码器数据帧有一个ToC条目。ToC字段和编解码器数据帧的详细格式在第5节中规定。
Multiple data frames may be included within a Interleaved/Bundled packet using interleaving or bundling as described in Section 6 and Section 7.
如第6节和第7节所述,可以使用交织或捆绑将多个数据帧包括在交织/捆绑分组中。
The Header-Free Packet Format is designed for maximum bandwidth efficiency and low latency. Only one codec data frame can be sent in each Header-Free format packet. None of the payload header fields (LLL, NNN, MMM, Count) nor ToC entries are present. The codec rate for the data frame can be determined from the length of the codec data frame, since there is only one codec data frame in each Header-Free packet.
无报头数据包格式旨在实现最大的带宽效率和低延迟。在每个无报头格式的数据包中只能发送一个编解码器数据帧。有效负载标头字段(LLL、NNN、MMM、Count)和ToC条目均不存在。数据帧的编解码器速率可以根据编解码器数据帧的长度来确定,因为每个无报头数据包中只有一个编解码器数据帧。
Use of the RTP header fields for Header-Free RTP/Vocoder Packet Format is the same as described in Section 4.1 for Interleaved/Bundled RTP/Vocoder Packet Format. The detailed format of the codec data frame is specified in Section 5.
对于无报头RTP/声码器数据包格式,RTP报头字段的使用与第4.1节中对于交织/捆绑RTP/声码器数据包格式所述的相同。编解码器数据帧的详细格式在第5节中规定。
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RTP Header [4] | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | + ONLY one codec data frame +-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RTP Header [4] | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | + ONLY one codec data frame +-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
All receivers SHOULD be able to process both packet formats. The sender MAY choose to use one or both packet formats.
所有接收器都应该能够处理这两种数据包格式。发送方可以选择使用一种或两种数据包格式。
A receiver MUST have prior knowledge of the packet format to correctly decode the RTP packets. When packets of both formats are used within the same session, different RTP payload type values MUST be used for each format to distinguish the packet formats. The association of payload type number with the packet format is done out-of-band, for example by SDP during the setup of a session.
接收机必须事先了解数据包格式,才能正确解码RTP数据包。当在同一会话中使用两种格式的数据包时,必须为每种格式使用不同的RTP有效负载类型值来区分数据包格式。有效负载类型编号与分组格式的关联在带外完成,例如在会话设置期间由SDP完成。
Each codec data frame in a Interleaved/Bundled packet has a corresponding Table of Contents (ToC) entry. The ToC entry indicates the rate of the codec frame. (Header-Free packets MUST NOT have a ToC field.)
交织/捆绑包中的每个编解码器数据帧都有相应的目录(ToC)条目。ToC条目指示编解码器帧的速率。(无报头数据包不得有ToC字段。)
Each ToC entry is occupies four bits. The format of the bits is indicated below:
每个ToC条目占用四位。位的格式如下所示:
0 1 2 3 +-+-+-+-+ |fr type| +-+-+-+-+
0 1 2 3 +-+-+-+-+ |fr type| +-+-+-+-+
Frame Type: 4 bits The frame type indicates the type of the corresponding codec data frame in the RTP packet.
帧类型:4位帧类型表示RTP数据包中相应编解码器数据帧的类型。
For EVRC and SMV codecs, the frame type values and size of the associated codec data frame are described in the table below:
对于EVRC和SMV编解码器,相关编解码器数据帧的帧类型值和大小如下表所示:
Value Rate Total codec data frame size (in octets) --------------------------------------------------------- 0 Blank 0 (0 bit) 1 1/8 2 (16 bits) 2 1/4 5 (40 bits; not valid for EVRC) 3 1/2 10 (80 bits) 4 1 22 (171 bits; 5 padded at end with zeros) 5 Erasure 0 (SHOULD NOT be transmitted by sender)
Value Rate Total codec data frame size (in octets) --------------------------------------------------------- 0 Blank 0 (0 bit) 1 1/8 2 (16 bits) 2 1/4 5 (40 bits; not valid for EVRC) 3 1/2 10 (80 bits) 4 1 22 (171 bits; 5 padded at end with zeros) 5 Erasure 0 (SHOULD NOT be transmitted by sender)
All values not listed in the above table MUST be considered reserved. A ToC entry with a reserved Frame Type value SHOULD be considered invalid. Note that the EVRC codec does not have 1/4 rate frames, thus frame type value 2 MUST be considered a reserved value when the EVRC codec is in use.
上表中未列出的所有值必须视为保留值。具有保留帧类型值的ToC条目应视为无效。请注意,EVRC编解码器没有1/4速率的帧,因此在使用EVRC编解码器时,帧类型值2必须被视为保留值。
Other vocoders that use this packet format need to specify their own table of frame types and corresponding codec data frames.
使用此数据包格式的其他声码器需要指定自己的帧类型表和相应的编解码器数据帧。
The output of the vocoder MUST be converted into codec data frames for inclusion in the RTP payload. The conversions for EVRC and SMV codecs are specified below. (Note: Because the EVRC codec does not have Rate 1/4 frames, the specifications of 1/4 frames does not apply to EVRC codec data frames). Other vocoders that use this packet format need to specify how to convert vocoder output data into frames.
声码器的输出必须转换为编解码器数据帧,以便包含在RTP有效负载中。下面指定了EVRC和SMV编解码器的转换。(注意:由于EVRC编解码器没有速率为1/4的帧,因此1/4帧的规范不适用于EVRC编解码器数据帧)。使用此数据包格式的其他声码器需要指定如何将声码器输出数据转换为帧。
The codec output data bits as numbered in EVRC and SMV are packed into octets. The lowest numbered bit (bit 1 for Rate 1, Rate 1/2, Rate 1/4 and Rate 1/8) is placed in the most significant bit (internet bit 0) of octet 1 of the codec data frame, the second lowest bit is placed in the second most significant bit of the first octet, the third lowest in the third most significant bit of the first octet, and so on. This continues until all of the bits have been placed in the codec data frame.
编解码器输出的数据位(在EVRC和SMV中编号)被压缩成八位字节。编号最低的位(速率1的位1、速率1/2、速率1/4和速率1/8)位于编解码器数据帧的八位字节1的最高有效位(互联网位0),第二最低位位于第一八位字节的第二最高有效位,第三最低位位于第一八位字节的第三最高有效位,依此类推。这将一直持续,直到所有位都已放置在编解码器数据帧中。
The remaining unused bits of the last octet of the codec data frame MUST be set to zero. Note that in EVRC and SMV this is only applicable to Rate 1 frames (171 bits) as the Rate 1/2 (80 bits), Rate 1/4 (40 bits, SMV only) and Rate 1/8 frames (16 bits) fit exactly into a whole number of octets.
编解码器数据帧最后八位字节的剩余未使用位必须设置为零。请注意,在EVRC和SMV中,这仅适用于速率1帧(171位),因为速率1/2(80位)、速率1/4(仅40位,SMV)和速率1/8帧(16位)正好适合整个八位字节数。
Following is a detailed listing showing a Rate 1 EVRC/SMV codec output frame converted into a codec data frame:
下面是一个详细列表,显示了转换为编解码器数据帧的速率为1的EVRC/SMV编解码器输出帧:
The codec data frame for a EVRC/SMV codec Rate 1 frame is 22 octets long. Bits 1 through 171 from the EVRC/SMV codec Rate 1 frame are placed as indicated, with bits marked with "Z" set to zero. EVRC/SMV codec Rate 1/8, Rate 1/4 and Rate 1/2 frames are converted similarly, but do not require zero padding because they align on octet boundaries.
EVRC/SMV编解码器速率1帧的编解码器数据帧长度为22个八位字节。来自EVRC/SMV编解码器速率1帧的位1到171如图所示放置,标有“Z”的位设置为零。EVRC/SMV编解码器速率1/8、速率1/4和速率1/2帧的转换类似,但不需要零填充,因为它们在八位字节边界上对齐。
Rate 1 codec data frame
速率1编解码器数据帧
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0|0|1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|3| |1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1| | | | | | |4|4|4|4|4|5|5|5|5|5|5|5|5|5|5|6|6|6|6|6|6|6|6|6|6|7|7|Z|Z|Z|Z|Z| |5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1| | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0|0|1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|3| |1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1| | | | | | |4|4|4|4|4|5|5|5|5|5|5|5|5|5|5|6|6|6|6|6|6|6|6|6|6|7|7|Z|Z|Z|Z|Z| |5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1| | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
As indicated in Section 4.1, more than one codec data frame MAY be included in a single Interleaved/Bundled packet by a sender. This is accomplished by interleaving or bundling.
如第4.1节所述,发送方可在单个交织/捆绑包中包括多个编解码器数据帧。这是通过交织或捆绑实现的。
Bundling is used to spread the transmission overhead of the RTP and payload header over multiple vocoder frames. Interleaving additionally reduces the listener's perception of data loss by spreading such loss over non-consecutive vocoder frames. EVRC, SMV, and similar vocoders are able to compensate for an occasional lost frame, but speech quality degrades exponentially with consecutive frame loss.
捆绑用于将RTP和有效负载报头的传输开销分散到多个声码器帧上。通过将数据丢失扩展到非连续声码器帧上,交错还减少了听者对数据丢失的感知。EVRC、SMV和类似声码器能够补偿偶尔丢失的帧,但语音质量随着连续帧丢失呈指数级下降。
Bundling is signaled by setting the LLL field to zero and the Count field to greater than zero. Interleaving is indicated by setting the LLL field to a value greater than zero.
通过将LLL字段设置为零,将Count字段设置为大于零来通知绑定。通过将LLL字段设置为大于零的值来指示交错。
The discussions on general interleaving apply to the bundling (which can be viewed as a reduced case of interleaving) with reduced complexity. The bundling case is discussed in detail in Section 7.
关于一般交织的讨论适用于具有较低复杂性的捆绑(可被视为交织的简化情况)。捆绑案例将在第7节中详细讨论。
Senders MAY support interleaving and/or bundling. All receivers that support Interleave/Bundling packet format MUST support both interleaving and bundling.
发送方可能支持交错和/或捆绑。所有支持交织/捆绑数据包格式的接收器必须同时支持交织和捆绑。
Given a time-ordered sequence of output frames from the codec numbered 0..n, a bundling value B (the value in the Count field plus one), and an interleave length L where n = B * (L+1) - 1, the output frames are placed into RTP packets as follows (the values of the fields LLL and NNN are indicated for each RTP packet):
给定来自编号为0..n的编解码器的输出帧的时序序列、捆绑值B(计数字段中的值加1)和交织长度L,其中n=B*(L+1)-1,输出帧被放置到RTP分组中,如下所示(为每个RTP分组指示字段LLL和NNN的值):
First RTP Packet in Interleave group: LLL=L, NNN=0 Frame 0, Frame L+1, Frame 2(L+1), Frame 3(L+1), ... for a total of B frames
First RTP Packet in Interleave group: LLL=L, NNN=0 Frame 0, Frame L+1, Frame 2(L+1), Frame 3(L+1), ... for a total of B frames
Second RTP Packet in Interleave group: LLL=L, NNN=1 Frame 1, Frame 1+L+1, Frame 1+2(L+1), Frame 1+3(L+1), ... for a total of B frames
Second RTP Packet in Interleave group: LLL=L, NNN=1 Frame 1, Frame 1+L+1, Frame 1+2(L+1), Frame 1+3(L+1), ... for a total of B frames
This continues to the last RTP packet in the interleave group:
这将继续到交织组中的最后一个RTP数据包:
L+1 RTP Packet in Interleave group: LLL=L, NNN=L Frame L, Frame L+L+1, Frame L+2(L+1), Frame L+3(L+1), ... for a total of B frames
L+1 RTP Packet in Interleave group: LLL=L, NNN=L Frame L, Frame L+L+1, Frame L+2(L+1), Frame L+3(L+1), ... for a total of B frames
Within each interleave group, the RTP packets making up the interleave group MUST be transmitted in value-increasing order of the NNN field. While this does not guarantee reduced end-to-end delay on the receiving end, when packets are delivered in order by the underlying transport, delay will be reduced to the minimum possible.
在每个交织组内,组成交织组的RTP分组必须按照NNN字段的值递增顺序传输。虽然这不能保证减少接收端的端到端延迟,但当底层传输按顺序传递数据包时,延迟将减少到尽可能小的程度。
Receivers MAY signal the maximum number of codec data frames (i.e., the maximum acceptable bundling value B) they can handle in a single RTP packet using the OPTIONAL maxptime RTP mode parameter identified in Section 12.
接收机可使用第12节中确定的可选maxptime RTP mode参数,在单个RTP分组中发送其可处理的最大编解码器数据帧数(即,最大可接受捆绑值B)的信号。
Receivers MAY signal the maximum interleave length (i.e., the maximum acceptable LLL value in the Interleaving Octet) they will accept using the OPTIONAL maxinterleave RTP mode parameter identified in Section 12.
接收机可使用第12节中确定的可选maxinterleave RTP模式参数,向其将接受的最大交织长度(即交织八位组中的最大可接受LLL值)发送信号。
The parameters maxptime and maxinterleave are exchanged at the initial setup of the session. In one-to-one sessions, the sender MUST respect these values set be the receiver, and MUST NOT interleave/bundle more packets than what the receiver signals that it can handle. This ensures that the receiver can allocate a known amount of buffer space that will be sufficient for all interleaving/bundling used in that session. During the session, the sender may decrease the bundling value or interleaving length (so that less buffer space is required at the receiver), but never exceed
参数maxptime和maxinterleave在会话初始设置时交换。在一对一会话中,发送方必须遵守由接收方设置的这些值,并且不能交织/捆绑超过接收方信号所能处理的数据包。这确保接收器可以分配已知数量的缓冲区空间,该缓冲区空间将足以用于该会话中使用的所有交织/捆绑。在会话期间,发送方可以减少捆绑值或交织长度(以便在接收方需要更少的缓冲区空间),但决不能超过
the maximum value set by the receiver. This prevents the situation where a receiver needs to allocate more buffer space in the middle of a session but is unable to do so.
接收器设置的最大值。这防止了接收器需要在会话中间分配更多缓冲空间但不能这样做的情况。
Additionally, senders have the following restrictions:
此外,发件人有以下限制:
o MUST NOT bundle more codec data frames in a single RTP packet than indicated by maxptime (see Section 12) if it is signaled.
o 在单个RTP数据包中捆绑的编解码器数据帧不得超过maxptime(见第12节)指示的数量(如果发出信号)。
o SHOULD NOT bundle more codec data frames in a single RTP packet than will fit in the MTU of the underlying network.
o 在单个RTP数据包中捆绑的编解码器数据帧不应超过底层网络的MTU所能容纳的数量。
o Once beginning a session with a given maximum interleaving value set by maxinterleave in Section 12, MUST NOT increase the interleaving value (LLL) to exceed the maximum interleaving value that is signaled.
o 一旦使用第12节中maxinterleave设置的给定最大交织值开始会话,不得将交织值(LLL)增加到超过所发信号的最大交织值。
o MAY change the interleaving value, but MUST do so only between interleave groups.
o 可以更改交织值,但只能在交织组之间更改。
o Silence suppression MUST only be used between interleave groups. A ToC with Frame Type 0 (Blank Frame, Section 5.1) MUST be used within interleaving groups if the codec outputs a blank frame. The M bit in the RTP header is not set for these blank frames, as the stream is continuous in time. Because there is only one time stamp for each RTP packet, silence suppression used within an interleave group would cause ambiguities when reconstructing the speech at the receiver side, and thus is prohibited.
o 静默抑制只能在交织组之间使用。如果编解码器输出空白帧,则帧类型为0(空白帧,第5.1节)的ToC必须在交织组中使用。RTP报头中的M位没有为这些空白帧设置,因为流在时间上是连续的。因为每个RTP分组只有一个时间戳,所以在交织组中使用的静默抑制在接收机侧重建语音时会导致歧义,因此被禁止。
Given an RTP packet with sequence number S, interleave length (field LLL) L, interleave index value (field NNN) N, and bundling value B, the interleave group consists of this RTP packet and other RTP packets with sequence numbers from S-N mod 65536 to S-N+L mod 65536 inclusive. In other words, the interleave group always consists of L+1 RTP packets with sequential sequence numbers. The bundling value for all RTP packets in an interleave group MUST be the same.
给定序列号为S、交织长度(字段LLL)L、交织索引值(字段NNN)N和捆绑值B的RTP数据包,交织组由该RTP数据包和序列号为S-N mod 65536到S-N+L mod 65536(含)的其他RTP数据包组成。换句话说,交织组总是由具有序列号的L+1个RTP分组组成。交织组中所有RTP数据包的绑定值必须相同。
The receiver determines the expected bundling value for all RTP packets in an interleave group by the number of codec data frames bundled in the first RTP packet of the interleave group received. Note that this may not be the first RTP packet of the interleave group if packets are delivered out of order by the underlying transport.
接收机通过在接收到的交织组的第一个RTP分组中捆绑的编解码器数据帧的数量来确定交织组中的所有RTP分组的预期捆绑值。注意,如果数据包由底层传输按顺序传送,则这可能不是交织组的第一个RTP数据包。
As discussed in Section 6, the bundling of codec data frames is a special reduced case of interleaving with LLL value in the Interleave Octet set to 0.
如第6节所述,编解码器数据帧的捆绑是交织的一种特殊简化情况,交织八位组中的LLL值设置为0。
Bundling codec data frames indicates that multiple data frames are included consecutively in a packet, because the interleaving length (LLL) is 0. The interleaving group is thus reduced to a single RTP packet, and the reconstruction of the codec data frames from RTP packets becomes a much simpler process.
捆绑编解码器数据帧表示一个数据包中连续包含多个数据帧,因为交织长度(LLL)为0。因此,交织组被减少到单个RTP分组,并且从RTP分组重构编解码器数据帧成为更简单的过程。
Furthermore, the additional restrictions on senders are reduced to:
此外,对发件人的附加限制减少为:
o MUST NOT bundle more codec data frames in a single RTP packet than indicated by maxptime (see Section 12) if it is signaled.
o 在单个RTP数据包中捆绑的编解码器数据帧不得超过maxptime(见第12节)指示的数量(如果发出信号)。
o SHOULD NOT bundle more codec data frames in a single RTP packet than will fit in the MTU of the underlying network.
o 在单个RTP数据包中捆绑的编解码器数据帧不应超过底层网络的MTU所能容纳的数量。
The vocoders covered by this payload format support erasure frames as an indication when frames are not available. The erasure frames are normally used internally by a receiver to advance the state of the voice decoder by exactly one frame time for each missing frame. Using the information from packet sequence number, time stamp, and the M bit, the receiver can detect missing codec data frames from RTP packet loss and/or silence suppression, and generate corresponding erasure frames. Erasure frames MUST also be used in storage format to record missing frames.
此有效负载格式所涵盖的声码器支持擦除帧作为帧不可用时的指示。擦除帧通常由接收机在内部使用,以便针对每个丢失帧将语音解码器的状态精确提前一帧时间。使用来自分组序列号、时间戳和M位的信息,接收机可以从RTP分组丢失和/或静默抑制中检测丢失的编解码器数据帧,并生成相应的擦除帧。擦除帧还必须以存储格式使用,以记录丢失的帧。
The vocoder interpolates the missing speech content when given an erasure frame. However, the best quality is perceived by the listener when erasure frames are not consecutive. This makes interleaving desirable as it increases speech quality when packet loss occurs.
当给定擦除帧时,声码器对丢失的语音内容进行插值。然而,当擦除帧不是连续的时,听者会感觉到最好的质量。这使得交织成为人们所期望的,因为当分组丢失发生时,它提高了语音质量。
On the other hand, interleaving can greatly increase the end-to-end delay. Where an interactive session is desired, either Interleaved/Bundled packet format with interleaving length (field LLL) 0 or Header-Free packet format is RECOMMENDED.
另一方面,交织可以大大增加端到端延迟。如果需要交互式会话,建议使用交织长度(字段LLL)为0的交织/捆绑数据包格式或无报头数据包格式。
When end-to-end delay is not a primary concern, an interleaving length (field LLL) of 4 or 5 is RECOMMENDED as it offers a reasonable compromise between robustness and latency.
当端到端延迟不是主要问题时,建议交错长度(字段LLL)为4或5,因为它在健壮性和延迟之间提供了合理的折衷。
When receiving an RTP packet, the receiver SHOULD check the validity of the ToC fields and match the length of the packet with what is indicated by the ToC fields. If any invalidity or mismatch is detected, it is RECOMMENDED to discard the received packet to avoid potential severe degradation of the speech quality. The discarded packet is treated following the same procedure as a lost packet, and the discarded data will be replaced with erasure frames.
接收RTP数据包时,接收方应检查ToC字段的有效性,并将数据包的长度与ToC字段指示的长度相匹配。如果检测到任何无效或不匹配,建议丢弃接收到的数据包,以避免语音质量可能严重下降。丢弃的数据包按照与丢失的数据包相同的过程进行处理,丢弃的数据将被擦除帧替换。
On receipt of an RTP packet with an invalid value of the LLL or NNN fields, the RTP packet SHOULD be treated as lost by the receiver for the purpose of generating erasure frames as described in Section 8.
在接收到具有LLL或NNN字段的无效值的RTP分组时,接收机应将RTP分组视为丢失,以生成第8节中所述的擦除帧。
On receipt of an RTP packet in an interleave group with other than the expected frame count value, the receiver MAY discard codec data frames off the end of the RTP packet or add erasure codec data frames to the end of the packet in order to manufacture a substitute packet with the expected bundling value. The receiver MAY instead choose to discard the whole interleave group.
在接收到交织组中具有非预期帧计数值的RTP分组时,接收机可以丢弃RTP分组末端的编解码器数据帧,或者向分组末端添加擦除编解码器数据帧,以便制造具有预期捆绑值的替代分组。接收机可以选择丢弃整个交织组。
Assume that the receiver has begun playing frames from an interleave group. The time has come to play frame x from packet n of the interleave group. Further assume that packet n of the interleave group has not been received. As described in Section 8, an erasure frame will be sent to the receiving vocoder.
假设接收器已开始播放交织组中的帧。从交织组的分组n播放帧x的时间到了。进一步假设交织组的分组n尚未被接收。如第8节所述,擦除帧将被发送到接收声码器。
Now, assume that packet n of the interleave group arrives before frame x+1 of that packet is needed. Receivers should use frame x+1 of the newly received packet n rather than substituting an erasure frame. In other words, just because packet n was not available the first time it was needed to reconstruct the interleaved speech, the receiver should not assume it is not available when it is subsequently needed for interleaved speech reconstruction.
现在,假设交织组的分组n在需要该分组的帧x+1之前到达。接收机应使用新接收的分组n的帧x+1,而不是替换擦除帧。换言之,仅仅因为分组n在第一次需要重建交织语音时不可用,所以接收机不应该在随后需要它进行交织语音重建时假设它不可用。
The Mode Request signal requests a particular encoding mode for the speech encoding in the reverse direction. All implementations are RECOMMENDED to honor the Mode Request signal. The Mode Request signal SHOULD only be used in one-to-one sessions. In multi-party sessions, any received Mode Request signals SHOULD be ignored.
模式请求信号请求反向语音编码的特定编码模式。建议所有实施方式都遵守模式请求信号。模式请求信号只能在一对一会话中使用。在多方会话中,应忽略任何接收到的模式请求信号。
In addition, the Mode Request signal MAY also be sent through non-RTP means, which is out of the scope of this specification.
此外,模式请求信号也可以通过非RTP方式发送,这超出了本规范的范围。
The three-bit Mode Request field is used to signal the receiver to set a particular encoding mode to its audio encoder. If the Mode Request field is set to a valid value in RTP packets from node A to node B, it is a request for node B to change to the requested encoding mode for its audio encoder and therefore the bit rate of the RTP stream from node B to node A. Once a node sets this field to a value, it SHOULD continue to set the field to the same value in subsequent packets until the requested mode is different. This design helps to eliminate the scenario of getting the codec stuck in an unintended state if one of the packets that carries the Mode Request is lost. An otherwise silent node MAY send an RTP packet containing a blank frame in order to send a Mode Request.
三位模式请求字段用于向接收器发送信号,以将特定编码模式设置为其音频编码器。如果模式请求字段设置为从节点a到节点B的RTP数据包中的有效值,则节点B请求更改其音频编码器的请求编码模式,从而更改从节点B到节点a的RTP流的比特率。一旦节点将此字段设置为值,在随后的数据包中,它应该继续将字段设置为相同的值,直到请求的模式不同为止。这种设计有助于消除在承载模式请求的其中一个数据包丢失时使编解码器陷入意外状态的情况。否则静默节点可以发送包含空白帧的RTP分组以发送模式请求。
Each codec type using this format SHOULD define its own interpretation of the Mode Request field. Codecs SHOULD follow the convention that higher values of the three-bit field correspond to an equal or lower average output bit rate.
使用此格式的每种编解码器类型都应该定义自己对模式请求字段的解释。编解码器应遵循三位字段的较高值对应于相等或较低平均输出比特率的约定。
For the EVRC codec, the Mode Request field MUST be interpreted according to Tables 2.2.1.2-1 and 2.2.1.2-2 of the EVRC codec specifications [1].
对于EVRC编解码器,必须根据EVRC编解码器规范的表2.2.1.2-1和表2.2.1.2-2解释模式请求字段[1]。
For SMV codec, the Mode Request field MUST be interpreted according to Table 2.2-2 of the SMV codec specifications [2].
对于SMV编解码器,模式请求字段必须根据SMV编解码器规范[2]的表2.2-2进行解释。
The storage format is used for storing speech frames, e.g., as a file or e-mail attachment.
存储格式用于存储语音帧,例如作为文件或电子邮件附件。
The file begins with a magic number to identify the vocoder that is used. The magic number for EVRC corresponds to the ASCII character string "#!EVRC\n", i.e., "0x23 0x21 0x45 0x56 0x52 0x43 0x0A". The magic number for SMV corresponds to the ASCII character string "#!SMV\n", i.e., "0x23 0x21 0x53 0x4d 0x56 0x0a".
该文件以一个幻数开始,以标识所使用的声码器。EVRC的幻数对应于ASCII字符串“#!EVRC\n”,即“0x23 0x21 0x45 0x56 0x52 0x43 0x0A”。SMV的幻数对应于ASCII字符串“#!SMV\n”,即“0x23 0x21 0x53 0x4d 0x56 0x0a”。
The codec data frames are stored in consecutive order, with a single TOC entry field, extended to one octet, prefixing each codec data frame. The ToC field is extended to one octet by setting the four most significant bits of the octet to zero. For example, a ToC value of 4 (a full-rate frame) is stored as 0x04.
编解码器数据帧以连续顺序存储,具有单个TOC输入字段,扩展为一个八位字节,在每个编解码器数据帧前加前缀。通过将八位字节的四个最高有效位设置为零,ToC字段扩展为一个八位字节。例如,ToC值4(全速率帧)存储为0x04。
Speech frames lost in transmission and non-received frames MUST be stored as erasure frames (frame type 5, see definition in Section 5.1) to maintain synchronization with the original media.
传输中丢失的语音帧和未接收的帧必须存储为擦除帧(帧类型5,见第5.1节中的定义),以保持与原始媒体的同步。
Four new MIME sub-types as described in this section have been registered by the IANA.
IANA已经注册了本节中描述的四个新MIME子类型。
The MIME-names for the EVRC and SMV codec are allocated from the IETF tree since all the vocoders covered are expected to be widely used for Voice-over-IP applications.
EVRC和SMV编解码器的MIME名称是从IETF树中分配的,因为所涵盖的所有声码器预计将广泛用于IP语音应用。
Media Type Name: audio
媒体类型名称:音频
Media Subtype Name: EVRC
媒体子类型名称:EVRC
Required Parameter: none
必需参数:无
Optional parameters: The following parameters apply to RTP transfer only.
可选参数:以下参数仅适用于RTP传输。
ptime: Defined as usual for RTP audio (see RFC 2327).
ptime:与RTP音频的常规定义相同(参见RFC 2327)。
maxptime: The maximum amount of media which can be encapsulated in each packet, expressed as time in milliseconds. The time SHALL be calculated as the sum of the time the media present in the packet represents. The time SHOULD be a multiple of the duration of a single codec data frame (20 msec). If not signaled, the default maxptime value SHALL be 200 milliseconds.
maxptime:每个数据包中可封装的最大媒体量,以毫秒表示。时间应计算为数据包中存在的媒体代表的时间之和。时间应该是单个编解码器数据帧持续时间的倍数(20毫秒)。如果未发出信号,默认的maxptime值应为200毫秒。
maxinterleave: Maximum number for interleaving length (field LLL in the Interleaving Octet). The interleaving lengths used in the entire session MUST NOT exceed this maximum value. If not signaled, the maxinterleave length SHALL be 5.
maxinterleave:交织长度的最大数目(交织八位字节中的字段LLL)。整个会话中使用的交织长度不得超过此最大值。如果未发出信号,最大交织长度应为5。
Encoding considerations: This type is defined for transfer of EVRC-encoded data via RTP using the Interleaved/Bundled packet format specified in Sections 4.1, 6, and 7 of RFC 3558. It is also defined for other transfer methods using the storage format specified in Section 11 of RFC 3558.
编码注意事项:该类型定义用于使用RFC 3558第4.1、6和7节中规定的交织/捆绑包格式,通过RTP传输EVRC编码数据。还定义了使用RFC 3558第11节规定的存储格式的其他传输方法。
Security considerations: See Section 14 "Security Considerations" of RFC 3558.
安全注意事项:见RFC 3558第14节“安全注意事项”。
Public specification: The EVRC vocoder is specified in 3GPP2 C.S0014. Transfer methods are specified in RFC 3558.
公共规范:3GPP2 C.S0014中规定了EVRC声码器。RFC 3558中规定了传输方法。
Additional information: The following information applies for storage format only.
附加信息:以下信息仅适用于存储格式。
Magic number: #!EVRC\n (see Section 11 of RFC 3558) File extensions: evc, EVC Macintosh file type code: none Object identifier or OID: none
Magic number: #!EVRC\n (see Section 11 of RFC 3558) File extensions: evc, EVC Macintosh file type code: none Object identifier or OID: none
Intended usage: COMMON. It is expected that many VoIP applications (as well as mobile applications) will use this type.
预期用途:普通。预计许多VoIP应用程序(以及移动应用程序)将使用这种类型。
Person & email address to contact for further information: Adam Li adamli@icsl.ucla.edu
联系人和电子邮件地址,以获取更多信息:Adam Liadamli@icsl.ucla.edu
Author/Change controller: Adam Li adamli@icsl.ucla.edu IETF Audio/Video Transport Working Group
作者/变更控制人:Adam Liadamli@icsl.ucla.eduIETF音频/视频传输工作组
Media Type Name: audio
媒体类型名称:音频
Media Subtype Name: EVRC0
媒体子类型名称:EVRC0
Required Parameters: none
所需参数:无
Optional parameters: none
可选参数:无
Encoding considerations: none This type is only defined for transfer of EVRC-encoded data via RTP using the Header-Free packet format specified in Section 4.2 of RFC 3558.
编码注意事项:无此类型仅用于使用RFC 3558第4.2节规定的无报头数据包格式通过RTP传输EVRC编码数据。
Security considerations: See Section 14 "Security Considerations" of RFC 3558.
安全注意事项:见RFC 3558第14节“安全注意事项”。
Public specification: The EVRC vocoder is specified in 3GPP2 C.S0014. Transfer methods are specified in RFC 3558.
公共规范:3GPP2 C.S0014中规定了EVRC声码器。RFC 3558中规定了传输方法。
Additional information: none
其他信息:无
Intended usage: COMMON. It is expected that many VoIP applications (as well as mobile applications) will use this type.
预期用途:普通。预计许多VoIP应用程序(以及移动应用程序)将使用这种类型。
Person & email address to contact for further information: Adam Li adamli@icsl.ucla.edu
联系人和电子邮件地址,以获取更多信息:Adam Liadamli@icsl.ucla.edu
Author/Change controller: Adam Li adamli@icsl.ucla.edu IETF Audio/Video Transport Working Group
作者/变更控制人:Adam Liadamli@icsl.ucla.eduIETF音频/视频传输工作组
Media Type Name: audio
媒体类型名称:音频
Media Subtype Name: SMV
媒体子类型名称:SMV
Required Parameter: none
必需参数:无
Optional parameters: The following parameters apply to RTP transfer only.
可选参数:以下参数仅适用于RTP传输。
ptime: Defined as usual for RTP audio (see RFC 2327).
ptime:与RTP音频的常规定义相同(参见RFC 2327)。
maxptime: The maximum amount of media which can be encapsulated in each packet, expressed as time in milliseconds. The time SHALL be calculated as the sum of the time the media present in the packet represents. The time SHOULD be a multiple of the duration of a single codec data frame (20 msec). If not signaled, the default maxptime value SHALL be 200 milliseconds.
maxptime:每个数据包中可封装的最大媒体量,以毫秒表示。时间应计算为数据包中存在的媒体代表的时间之和。时间应该是单个编解码器数据帧持续时间的倍数(20毫秒)。如果未发出信号,默认的maxptime值应为200毫秒。
maxinterleave: Maximum number for interleaving length (field LLL in the Interleaving Octet). The interleaving lengths used in the entire session MUST NOT exceed this maximum value. If not signaled, the maxinterleave length SHALL be 5.
maxinterleave:交织长度的最大数目(交织八位字节中的字段LLL)。整个会话中使用的交织长度不得超过此最大值。如果未发出信号,最大交织长度应为5。
Encoding considerations: This type is defined for transfer of SMV-encoded data via RTP using the Interleaved/Bundled packet format specified in Section 4.1, 6, and 7 of RFC 3558. It is also defined for other transfer methods using the storage format specified in Section 11 of RFC 3558.
编码注意事项:该类型定义用于使用RFC 3558第4.1、6和7节中规定的交织/捆绑包格式通过RTP传输SMV编码数据。还定义了使用RFC 3558第11节规定的存储格式的其他传输方法。
Security considerations: See Section 14 "Security Considerations" of RFC 3558.
安全注意事项:见RFC 3558第14节“安全注意事项”。
Public specification: The SMV vocoder is specified in 3GPP2 C.S0030-0 v2.0. Transfer methods are specified in RFC 3558.
公共规范:SMV声码器在3GPP2 C.S0030-0 v2.0中有规定。RFC 3558中规定了传输方法。
Additional information: The following information applies to storage format only.
附加信息:以下信息仅适用于存储格式。
Magic number: #!SMV\n (see Section 11 of RFC 3558) File extensions: smv, SMV Macintosh file type code: none Object identifier or OID: none
Magic number: #!SMV\n (see Section 11 of RFC 3558) File extensions: smv, SMV Macintosh file type code: none Object identifier or OID: none
Intended usage: COMMON. It is expected that many VoIP applications (as well as mobile applications) will use this type.
预期用途:普通。预计许多VoIP应用程序(以及移动应用程序)将使用这种类型。
Person & email address to contact for further information: Adam Li adamli@icsl.ucla.edu
联系人和电子邮件地址,以获取更多信息:Adam Liadamli@icsl.ucla.edu
Author/Change controller: Adam Li adamli@icsl.ucla.edu IETF Audio/Video Transport Working Group
作者/变更控制人:Adam Liadamli@icsl.ucla.eduIETF音频/视频传输工作组
Media Type Name: audio
媒体类型名称:音频
Media Subtype Name: SMV0
媒体子类型名称:SMV0
Required Parameter: none
必需参数:无
Optional parameters: none
可选参数:无
Encoding considerations: none This type is only defined for transfer of SMV-encoded data via RTP using the Header-Free packet format specified in Section 4.2 of RFC 3558.
编码注意事项:无此类型仅用于使用RFC 3558第4.2节中规定的无报头数据包格式通过RTP传输SMV编码数据。
Security considerations: See Section 14 "Security Considerations" of RFC 3558.
安全注意事项:见RFC 3558第14节“安全注意事项”。
Public specification: The SMV vocoder is specified in 3GPP2 C.S0030-0 v2.0. Transfer methods are specified in RFC 3558.
公共规范:SMV声码器在3GPP2 C.S0030-0 v2.0中有规定。RFC 3558中规定了传输方法。
Additional information: none
其他信息:无
Intended usage: COMMON. It is expected that many VoIP applications (as well as mobile applications) will use this type.
预期用途:普通。预计许多VoIP应用程序(以及移动应用程序)将使用这种类型。
Person & email address to contact for further information: Adam Li adamli@icsl.ucla.edu
联系人和电子邮件地址,以获取更多信息:Adam Liadamli@icsl.ucla.edu
Author/Change controller: Adam Li adamli@icsl.ucla.edu IETF Audio/Video Transport Working Group
作者/变更控制人:Adam Liadamli@icsl.ucla.eduIETF音频/视频传输工作组
Please note that this section applies to the RTP transfer only.
请注意,本节仅适用于RTP传输。
The information carried in the MIME media type specification has a specific mapping to fields in the Session Description Protocol (SDP) [6], which is commonly used to describe RTP sessions. When SDP is used to specify sessions employing the EVRC or EMV codec, the mapping is as follows:
MIME媒体类型规范中包含的信息具有到会话描述协议(SDP)[6]中字段的特定映射,该协议通常用于描述RTP会话。当使用SDP指定使用EVRC或EMV编解码器的会话时,映射如下:
o The MIME type ("audio") goes in SDP "m=" as the media name.
o MIME类型(“音频”)以SDP“m=”作为媒体名称。
o The MIME subtype (payload format name) goes in SDP "a=rtpmap" as the encoding name.
o MIME子类型(有效负载格式名称)以SDP“a=rtpmap”作为编码名称。
o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and "a=maxptime" attributes, respectively.
o 参数“ptime”和“maxptime”分别位于SDP“a=ptime”和“a=maxptime”属性中。
o The parameter "maxinterleave" goes in the SDP "a=fmtp" attribute by copying it directly from the MIME media type string as "maxinterleave=value".
o 参数“maxinterleave”直接从MIME媒体类型字符串复制为“maxinterleave=value”,从而进入SDP“a=fmtp”属性。
Some examples of SDP session descriptions for EVRC and SMV encodings follow below.
下面是EVRC和SMV编码的SDP会话描述的一些示例。
Example of usage of EVRC:
EVRC的使用示例:
m=audio 49120 RTP/AVP 97 a=rtpmap:97 EVRC/8000 a=fmtp:97 maxinterleave=2 a=maxptime:80
m=audio 49120 RTP/AVP 97 a=rtpmap:97 EVRC/8000 a=fmtp:97 maxinterleave=2 a=maxptime:80
Example of usage of SMV
SMV的使用示例
m=audio 49122 RTP/AVP 99 a=rtpmap:99 SMV0/8000 a=fmtp:99
m=audio 49122 RTP/AVP 99 a=rtpmap:99 SMV0/8000 a=fmtp:99
Note that the payload format (encoding) names are commonly shown in upper case. MIME subtypes are commonly shown in lower case. These names are case-insensitive in both places. Similarly, parameter names are case-insensitive both in MIME types and in the default mapping to the SDP a=fmtp attribute.
请注意,有效负载格式(编码)名称通常以大写形式显示。MIME子类型通常以小写形式显示。这些名称在两个位置都不区分大小写。类似地,参数名在MIME类型和到SDP a=fmtp属性的默认映射中都不区分大小写。
RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [4], and any appropriate profile (for example [5]). This implies that confidentiality of the media streams is achieved by encryption. Because the data compression used with this payload format is applied end-to-end, encryption may be performed after compression so there is no conflict between the two operations.
使用本规范中定义的有效负载格式的RTP数据包受RTP规范[4]和任何适当配置文件(例如[5])中讨论的安全注意事项的约束。这意味着媒体流的机密性是通过加密实现的。由于与此有效负载格式一起使用的数据压缩是端到端应用的,因此可以在压缩之后执行加密,因此两个操作之间没有冲突。
A potential denial-of-service threat exists for data encoding using compression techniques that have non-uniform receiver-end computational load. The attacker can inject pathological datagrams into the stream which are complex to decode and cause the receiver to become overloaded. However, the encodings covered in this document do not exhibit any significant non-uniformity.
使用具有非均匀接收端计算负载的压缩技术进行数据编码存在潜在的拒绝服务威胁。攻击者可以向流中注入病理数据报,这些数据报解码复杂,并导致接收器过载。然而,本文件中涵盖的编码并未表现出任何显著的不一致性。
As with any IP-based protocol, in some circumstances, a receiver may be overloaded simply by the receipt of too many packets, either desired or undesired. Network-layer authentication may be used to discard packets from undesired sources, but the processing cost of the authentication itself may be too high. In a multicast environment, pruning of specific sources may be implemented in future versions of IGMP [7] and in multicast routing protocols to allow a receiver to select which sources are allowed to reach it.
与任何基于IP的协议一样,在某些情况下,接收机可能仅仅因为接收了太多的数据包而过载,不管是想要的还是不想要的。网络层认证可用于丢弃来自不希望的源的数据包,但认证本身的处理成本可能过高。在多播环境中,可以在IGMP[7]的未来版本和多播路由协议中实现特定源的修剪,以允许接收机选择允许哪些源到达它。
Interleaving may affect encryption. Depending on the used encryption scheme there may be restrictions on, for example, the time when keys can be changed. Specifically, the key change may need to occur at the boundary between interleave groups.
交错可能会影响加密。根据所使用的加密方案,可能会有一些限制,例如,更改密钥的时间。具体地说,密钥改变可能需要发生在交织组之间的边界处。
As described above, the RTP packet format defined in this document is very flexible and designed to be usable by other frame-based vocoders.
如上所述,本文档中定义的RTP数据包格式非常灵活,可供其他基于帧的声码器使用。
Additional vocoders using this format MUST have properties as described in Section 3.3.
使用此格式的其他声码器必须具有第3.3节所述的特性。
For an eligible vocoder to use the payload format mechanisms defined in this document, a new RTP payload format document needs to be published as a standards track RFC. That document can simply refer to this document and then specify the following parameters:
为了使合格的声码器使用本文件中定义的有效载荷格式机制,需要将新的RTP有效载荷格式文件发布为标准跟踪RFC。该文档可以简单地引用该文档,然后指定以下参数:
o Define the unit used for RTP time stamp; o Define the meaning of the Mode Request bits; o Define corresponding codec data frame type values for ToC; o Define the conversion procedure for vocoders output data frame; o Define a magic number for storage format, and complete the corresponding MIME registration.
o 定义用于RTP时间戳的单位;o定义模式请求位的含义;o为ToC定义相应的编解码器数据帧类型值;o定义声码器输出数据帧的转换程序;o为存储格式定义一个幻数,并完成相应的MIME注册。
The following authors have made significant contributions to this document: Adam H. Li, John D. Villasenor, Dong-Seek Park, Jeong-Hoon Park, Keith Miller, S. Craig Greer, David Leon, Nikolai Leung, Marcello Lioy, Kyle J. McKay, Magdalena L. Espelien, Randall Gellens, Tom Hiller, Peter J. McCann, Stinson S. Mathai, Michael D. Turner, Ajay Rajkumar, Dan Gal, Magnus Westerlund, Lars-Erik Jonsson, Greg Sherwood, and Thomas Zeng.
以下作者对本文件做出了重要贡献:亚当·H·李、约翰·D·维拉塞纳、东求公园、正勋公园、基思·米勒、S·克雷格·格里尔、大卫·里昂、梁尼古拉、马塞洛·莱奥、凯尔·J·麦凯、玛格达琳娜·L·埃斯佩林、兰德尔·盖伦斯、汤姆·希勒、彼得·J·麦肯、斯汀森·S·马泰、迈克尔·D·特纳、阿杰·拉吉库马尔、,Dan Gal、Magnus Westerlund、Lars Erik Jonsson、Greg Sherwood和Thomas Zeng。
[1] 3GPP2 C.S0014, "Enhanced Variable Rate Codec, Speech Service Option 3 for Wideband Spread Spectrum Digital Systems", January 1997.
[1] 3GPP2 C.S0014,“用于宽带扩频数字系统的增强型可变速率编解码器,语音服务选项3”,1997年1月。
[2] 3GPP2 C.S0030-0 v2.0, "Selectable Mode Vocoder, Service Option for Wideband Spread Spectrum Communication Systems", May 2002.
[2] 3GPP2 C.S0030-0 v2.0,“可选模式声码器,宽带扩频通信系统的服务选项”,2002年5月。
[3] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[3] Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。
[4] Schulzrinne, H., Casner, S., Jacobson, V. and R. Frederick, "RTP: A Transport Protocol for Real-Time Applications", RFC 3550, July 2003.
[4] Schulzrinne,H.,Casner,S.,Jacobson,V.和R.Frederick,“RTP:实时应用的传输协议”,RFC 35502003年7月。
[5] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video Conferences with Minimal Control", RFC 3551, July 2003.
[5] Schulzrinne,H.和S.Casner,“具有最小控制的音频和视频会议的RTP配置文件”,RFC 3551,2003年7月。
[6] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998.
[6] Handley,M.和V.Jacobson,“SDP:会话描述协议”,RFC 2327,1998年4月。
[7] Deering, S., "Host Extensions for IP Multicasting", STD 5, RFC 1112, August 1989.
[7] Deering,S.,“IP多播的主机扩展”,STD 5,RFC 1112,1989年8月。
Adam H. Li Image Communication Lab Electrical Engineering Department University of California Los Angeles, CA 90095 USA
加利福尼亚大学Adam H. Li图像通信实验室电气工程系,CA洛杉矶90095美国
Phone: +1 310 825 5178 EMail: adamli@icsl.ucla.edu
Phone: +1 310 825 5178 EMail: adamli@icsl.ucla.edu
Copyright (C) The Internet Society (2003). All Rights Reserved.
版权所有(C)互联网协会(2003年)。版权所有。
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.
本文件及其译本可复制并提供给他人,对其进行评论或解释或协助其实施的衍生作品可全部或部分编制、复制、出版和分发,不受任何限制,前提是上述版权声明和本段包含在所有此类副本和衍生作品中。但是,不得以任何方式修改本文件本身,例如删除版权通知或对互联网协会或其他互联网组织的引用,除非出于制定互联网标准的需要,在这种情况下,必须遵循互联网标准过程中定义的版权程序,或根据需要将其翻译成英语以外的其他语言。
The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.
上述授予的有限许可是永久性的,互联网协会或其继承人或受让人不会撤销。
This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
本文件和其中包含的信息是按“原样”提供的,互联网协会和互联网工程任务组否认所有明示或暗示的保证,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。
Acknowledgement
确认
Funding for the RFC Editor function is currently provided by the Internet Society.
RFC编辑功能的资金目前由互联网协会提供。