Network Working Group J. Rey Request for Comments: 4396 Y. Matsui Category: Standards Track Panasonic February 2006
Network Working Group J. Rey Request for Comments: 4396 Y. Matsui Category: Standards Track Panasonic February 2006
RTP Payload Format for 3rd Generation Partnership Project (3GPP) Timed Text
第三代合作伙伴关系项目(3GPP)定时文本的RTP有效负载格式
Status of This Memo
关于下段备忘
This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.
本文件规定了互联网社区的互联网标准跟踪协议,并要求进行讨论和提出改进建议。有关本协议的标准化状态和状态,请参考当前版本的“互联网官方协议标准”(STD 1)。本备忘录的分发不受限制。
Copyright Notice
版权公告
Copyright (C) The Internet Society (2006).
版权所有(C)互联网协会(2006年)。
Abstract
摘要
This document specifies an RTP payload format for the transmission of 3GPP (3rd Generation Partnership Project) timed text. 3GPP timed text is a time-lined, decorated text media format with defined storage in a 3GP file. Timed Text can be synchronized with audio/video contents and used in applications such as captioning, titling, and multimedia presentations. In the following sections, the problems of streaming timed text are addressed, and a payload format for streaming 3GPP timed text over RTP is specified.
本文档指定了用于传输3GPP(第三代合作伙伴关系项目)定时文本的RTP有效负载格式。3GPP定时文本是一种按时间排列的、经过修饰的文本媒体格式,在3GPP文件中定义了存储。定时文本可以与音频/视频内容同步,并用于字幕、标题和多媒体演示等应用。在以下部分中,将解决流式传输定时文本的问题,并指定用于通过RTP流式传输3GPP定时文本的有效负载格式。
Table of Contents
目录
1. Introduction ....................................................3 2. Motivation, Requirements, and Design Rationale ..................3 2.1. Motivation .................................................3 2.2. Basic Components of the 3GPP Timed Text Media Format .......4 2.3. Requirements ...............................................5 2.4. Limitations ................................................6 2.5. Design Rationale ...........................................7 3. Terminology ....................................................10 4. RTP Payload Format for 3GPP Timed Text .........................12 4.1. Payload Header Definitions ................................13 4.1.1. Common Payload Header Fields .......................15 4.1.2. TYPE 1 Header ......................................17 4.1.3. TYPE 2 Header ......................................20 4.1.4. TYPE 3 Header ......................................23 4.1.5. TYPE 4 Header ......................................24 4.1.6. TYPE 5 Header ......................................25 4.2. Buffering of Sample Descriptions ..........................25 4.2.1. Dynamic SIDX Wraparound Mechanism ..................26 4.3. Finding Payload Header Values in 3GP Files ................28 4.4. Fragmentation of Timed Text Samples .......................31 4.5. Reassembling Text Samples at the Receiver .................33 4.6. On Aggregate Payloads .....................................35 4.7. Payload Examples ..........................................39 4.8. Relation to RFC 3640 ......................................43 4.9. Relation to RFC 2793 ......................................44 5. Resilient Transport ............................................45 6. Congestion Control .............................................46 7. Scene Description ..............................................47 7.1. Text Rendering Position and Composition ...................47 7.2. SMIL Usage ................................................48 7.3. Finding Layout Values in a 3GP File .......................48 8. 3GPP Timed Text Media Type .....................................49 9. SDP Usage ......................................................53 9.1. Mapping to SDP ............................................53 9.2. Parameter Usage in the SDP Offer/Answer Model .............53 9.2.1. Unicast Usage ......................................54 9.2.2. Multicast Usage ....................................57 9.3. Offer/Answer Examples .....................................58 9.4. Parameter Usage outside of Offer/Answer ...................60 10. IANA Considerations ...........................................60 11. Security Considerations .......................................60 12. References ....................................................61 12.1. Normative References .....................................61 12.2. Informative References ...................................61 13. Basics of the 3GP File Structure ..............................64 14. Acknowledgements ..............................................65
1. Introduction ....................................................3 2. Motivation, Requirements, and Design Rationale ..................3 2.1. Motivation .................................................3 2.2. Basic Components of the 3GPP Timed Text Media Format .......4 2.3. Requirements ...............................................5 2.4. Limitations ................................................6 2.5. Design Rationale ...........................................7 3. Terminology ....................................................10 4. RTP Payload Format for 3GPP Timed Text .........................12 4.1. Payload Header Definitions ................................13 4.1.1. Common Payload Header Fields .......................15 4.1.2. TYPE 1 Header ......................................17 4.1.3. TYPE 2 Header ......................................20 4.1.4. TYPE 3 Header ......................................23 4.1.5. TYPE 4 Header ......................................24 4.1.6. TYPE 5 Header ......................................25 4.2. Buffering of Sample Descriptions ..........................25 4.2.1. Dynamic SIDX Wraparound Mechanism ..................26 4.3. Finding Payload Header Values in 3GP Files ................28 4.4. Fragmentation of Timed Text Samples .......................31 4.5. Reassembling Text Samples at the Receiver .................33 4.6. On Aggregate Payloads .....................................35 4.7. Payload Examples ..........................................39 4.8. Relation to RFC 3640 ......................................43 4.9. Relation to RFC 2793 ......................................44 5. Resilient Transport ............................................45 6. Congestion Control .............................................46 7. Scene Description ..............................................47 7.1. Text Rendering Position and Composition ...................47 7.2. SMIL Usage ................................................48 7.3. Finding Layout Values in a 3GP File .......................48 8. 3GPP Timed Text Media Type .....................................49 9. SDP Usage ......................................................53 9.1. Mapping to SDP ............................................53 9.2. Parameter Usage in the SDP Offer/Answer Model .............53 9.2.1. Unicast Usage ......................................54 9.2.2. Multicast Usage ....................................57 9.3. Offer/Answer Examples .....................................58 9.4. Parameter Usage outside of Offer/Answer ...................60 10. IANA Considerations ...........................................60 11. Security Considerations .......................................60 12. References ....................................................61 12.1. Normative References .....................................61 12.2. Informative References ...................................61 13. Basics of the 3GP File Structure ..............................64 14. Acknowledgements ..............................................65
3GPP timed text is a media format for time-lined, decorated text specified in the 3GPP Technical Specification TS 26.245, "Transparent end-to-end packet switched streaming service (PSS); Timed Text Format (Release 6)" [1]. Besides plain text, the 3GPP timed text format allows the creation of decorated text such as that for karaoke applications, scrolling text for newscasts, or hyperlinked text. These contents may or may not be synchronized with other media, such as audio or video.
3GPP定时文本是3GPP技术规范TS 26.245“透明端到端分组交换流媒体服务(PSS);定时文本格式(第6版)”[1]中规定的时间线、修饰文本的媒体格式。除了纯文本之外,3GPP定时文本格式还允许创建装饰文本,例如用于卡拉OK应用程序的文本、用于新闻广播的滚动文本或超链接文本。这些内容可能与音频或视频等其他媒体同步,也可能不同步。
The purpose of this document is to provide a means to stream 3GPP timed text contents using RTP [3]. This includes the streaming of timed text being read out of a (3GP) file, as well as the streaming of timed text generated in real-time, a.k.a. live streaming.
本文档的目的是提供一种使用RTP传输3GPP定时文本内容的方法[3]。这包括从(3GP)文件中读取的定时文本流,以及实时生成的定时文本流,也称为实时流。
Section 2 contains the motivation for this document, an overview of the media format, the requirements, and the design rationale. Section 3 defines the terminology used. Section 4 specifies the payload headers, the fragmentation and re-assembly rules for text samples, the rules for payload aggregation, and the relations of this document to RFC 3640 [12] and RFC 2793 [22]. Section 5 specifies some simple schemes for resilient transport and gives pointers to other possible mechanisms. Section 6 addresses congestion control. Section 7 specifies scene description. Section 8 defines the media type. Section 9 specifies SDP for unicast and multicast sessions, including usage in the Offer/Answer model [13]. Sections 10 and 11 address IANA and security considerations. Section 12 lists references. Basics of the 3GP File Structure are in Section 13.
第2节包含本文件的动机、媒体格式概述、要求和设计原理。第3节定义了使用的术语。第4节规定了有效载荷标题、文本样本的碎片和重新组装规则、有效载荷聚合规则以及本文档与RFC 3640[12]和RFC 2793[22]的关系。第5节指定了一些弹性传输的简单方案,并给出了指向其他可能机制的指针。第6节讨论拥塞控制。第7节指定了场景描述。第8节定义了媒体类型。第9节规定了单播和多播会话的SDP,包括在提供/应答模型中的使用[13]。第10节和第11节讨论了IANA和安全注意事项。第12节列出了参考文献。第13节介绍了3GP文件结构的基本知识。
The 3GPP timed text format was developed for use in the services specified in the 3GPP Transparent End-to-end Packet-switched Streaming Services (3GPP PSS) specification [16].
3GPP定时文本格式是为在3GPP透明端到端分组交换流媒体服务(3GPP PSS)规范[16]中指定的服务中使用而开发的。
As of today, PSS allows downloading 3GPP timed text contents stored in 3GP files. However, due to the lack of a RTP payload format, it is not possible to stream 3GPP timed text contents over RTP.
从今天起,PSS允许下载存储在3GP文件中的3GPP定时文本内容。然而,由于缺乏RTP有效负载格式,不可能通过RTP流传输3GPP定时文本内容。
This document specifies such a payload format.
本文件规定了这种有效载荷格式。
Before going into the details of the design, it is necessary to know how the media format is constructed. We can identify four differentiated functional components: layout information, default formatting, text strings, and decoration. In the following, we shortly explain these and match them to their designations in a 3GP file:
在深入研究设计细节之前,有必要了解媒体格式是如何构造的。我们可以识别四个不同的功能组件:布局信息、默认格式、文本字符串和装饰。在下文中,我们将简要解释这些问题,并将其与3GP文件中的名称进行匹配:
o Initial spatial layout information related to the text strings: These are the height and width of the text region where text is displayed, the position of the text region in the display, and the layer or proximity of the text to the user. In 3GP files, this information is contained in the Track Header Box (3GP file designations are capitalized for clarity).
o 与文本字符串相关的初始空间布局信息:这些信息是显示文本的文本区域的高度和宽度、文本区域在显示器中的位置以及文本与用户之间的层次或距离。在3GP文件中,此信息包含在曲目标题框中(为了清晰起见,3GP文件名称大写)。
o Default settings for formatting and positioning of text: style (font, size, color,...), background color, horizontal and vertical justification, line width, scrolling, etc. For 3GP files, this corresponds to the Sample Descriptions.
o 文本格式和位置的默认设置:样式(字体、大小、颜色等)、背景色、水平和垂直对齐、线宽、滚动等。对于3GP文件,这与示例说明相对应。
o The actual text strings: encoded characters using either UTF-8 [18] or UTF-16 [19] encoding.
o 实际文本字符串:使用UTF-8[18]或UTF-16[19]编码的编码字符。
o The decoration: If some characters have different style, delay, blink, etc., this needs to be indicated. The decoration is only present in the text samples if it is actually needed. Otherwise, the default settings as above apply. In 3GP files, within each Text Sample, the decoration (i.e., Modifier Boxes) is appended to the text strings, if needed. At the time of writing this payload format, the following modifiers are specified in the 3GPP timed text media format specification [1]:
o 装饰:如果一些字符有不同的风格,延迟,闪烁等,这需要指出。只有在实际需要时,装饰才会出现在文本示例中。与上述设置相同,否则将应用默认设置。在3GP文件中,在每个文本示例中,如果需要,将装饰(即修改器框)附加到文本字符串中。在编写此有效负载格式时,3GPP定时文本媒体格式规范[1]中指定了以下修饰符:
- text highlight - highlight color - blinking text - karaoke feature - hyperlink - text delay - text style - positioning of the text box - text wrap indication
- 文本突出显示-突出显示颜色-闪烁文本-卡拉OK功能-超链接-文本延迟-文本样式-文本框定位-文本换行指示
Once the basic components are known, it is necessary to define which requirements the payload format shall fulfill:
一旦基本组件已知,有必要定义有效载荷格式应满足的要求:
1. It shall enable both live streaming and streaming from a 3GP file.
1. 它应支持实时流媒体和3GP文件流媒体。
Informative note: For the purpose of this document, the term "live streaming" refers to those scenarios where the timed text stream is sent from a live encoder. Upon reception, the content may or may not be stored in a 3GP file. Typically, in live streaming applications, the sender encapsulates the timed text content in RTP packets following the guidelines given in this document. At the receiving side, a buffer is used to cancel the network delay and delay jitter. If receiver and sender support packet loss resilience mechanisms (see Section 5), it may also be possible to recover from packet losses. Note that how sender and receiver actually manage and dimension the buffers is an implementation design choice.
资料性说明:在本文档中,“实时流”一词指的是从实时编码器发送定时文本流的场景。在接收时,内容可以存储在3GP文件中,也可以不存储在3GP文件中。通常,在流媒体直播应用程序中,发送方按照本文档中给出的准则将定时文本内容封装在RTP数据包中。在接收端,使用缓冲区来消除网络延迟和延迟抖动。如果接收方和发送方支持数据包丢失恢复机制(见第5节),也可以从数据包丢失中恢复。请注意,发送方和接收方如何实际管理和标注缓冲区是实现设计的选择。
2. Furthermore, it shall be possible for an RTP receiver using this payload format, and capable of storing in 3GP format, to obtain all necessary information from the RTP packets for storing the received text contents according to the 3GP file format. This file may or may not be the same as the original file.
2. 此外,对于使用该有效载荷格式且能够以3GP格式存储的RTP接收器,应能够从RTP分组中获取所有必要信息,以根据3GP文件格式存储接收到的文本内容。此文件可能与原始文件相同,也可能与原始文件不同。
Informative note: The 3GP file format itself is based on the ISO Base Media File Format recommendation [2]. Section 13.1 gives some insight into the 3GP file structure. Further, Sections 4.3 and 7.3 specify where the information needed for filling in payload headers is found in a 3GP file. For live streaming, appropriate values complying with the format and units described in [1] shall be used. Where needed, clarifications on appropriate values are given in this document.
资料性说明:3GP文件格式本身基于ISO基本媒体文件格式建议[2]。第13.1节对3GP文件结构进行了一些深入了解。此外,第4.3节和第7.3节规定了填写有效负载头所需的信息在3GP文件中的位置。对于直播,应使用符合[1]中所述格式和单位的适当值。如有必要,本文件中给出了适当值的说明。
3. It shall enable efficient and resilient transport of timed text contents over RTP. In particular:
3. 它应能够通过RTP高效、弹性地传输定时文本内容。特别地:
a. Enable the transmission of the sample descriptions by both out-of-band and in-band means. Sample descriptions are important information, which potentially apply to several text samples. These default formatting settings are typically transmitted out-of-band (reliably) once at the initialization phase. If additional sample descriptions
a. 通过带外和带内方式传输样本描述。示例描述是重要的信息,可能适用于多个文本示例。这些默认格式设置通常在初始化阶段进行一次带外传输(可靠)。如果需要其他示例说明
are needed in the course of a session, these may also be sent out-of-band or in-band. In-band transmission, although unreliable, may be more appropriate for sending sample descriptions if these should be sent frequently, as opposed to establishing an additional communication channel for SDP, for example. It is also useful in cases where an out-of-band channel may not be available and for live streaming, where contents are not known a priori. Thus, the payload format shall enable out-of-band and in-band transmission of sample descriptions. Section 4.1.6 specifies a payload header for transmitting sample descriptions in-band. Section 9 specifies how sample descriptions are mapped to SDP.
在会话过程中需要,也可以带外或带内发送。带内传输虽然不可靠,但如果应频繁发送样本描述,则可能更适合发送样本描述,例如,与为SDP建立附加通信信道相反。在带外频道可能不可用的情况下,以及在内容事先未知的实时流媒体中,它也很有用。因此,有效载荷格式应能够实现样本描述的带外和带内传输。第4.1.6节规定了用于在频带内传输样本描述的有效载荷报头。第9节指定了如何将示例描述映射到SDP。
b. Enable the fragmentation of a text sample into several RTP packets in order to cover a wide range of applications and network environments. In general, fragmentation should be a rare event, given the low bit rates and relatively small text sample sizes. However, the 3GPP Timed Text media format does allow for larger text samples. Therefore, the payload format shall take this into account and provide a means for coping with fragmentation and reassembly. Section 4.4 deals with fragmentation.
b. 支持将文本样本分割为多个RTP数据包,以覆盖广泛的应用程序和网络环境。一般来说,鉴于低比特率和相对较小的文本样本量,碎片应该是一种罕见的事件。然而,3GPP定时文本媒体格式允许更大的文本样本。因此,有效载荷格式应考虑到这一点,并提供处理碎片和重新组装的方法。第4.4节涉及碎片。
c. Enable the aggregation of units into an RTP packet for making the transport more efficient. In a mobile communication environment, a typical text sample size is around 100-200 bytes. If the available bit rate and the packet size allow it, units should be aggregated into one RTP packet. Section 4.6 deals with aggregation.
c. 允许将单元聚合到RTP数据包中,以提高传输效率。在移动通信环境中,典型的文本样本大小约为100-200字节。如果可用比特率和数据包大小允许,则应将单元聚合为一个RTP数据包。第4.6节涉及聚合。
d. Enable the use of resilient transport mechanisms, such as repetition, retransmission [11], and FEC [7] (see Section 5). For a more general discussion, refer to RFC 2354 [8], which discusses available mechanisms for stream repair.
d. 允许使用弹性传输机制,如重复、重传[11]和FEC[7](见第5节)。有关更一般性的讨论,请参阅RFC 2354[8],其中讨论了流修复的可用机制。
The payload headers have been optimized in size for RTP. Instead of using 32-bit (S)LEN, SDUR, and SIDX header fields, which would carry many unused bits much of the time, it has been a design choice to reduce the size of these fields. As a consequence, this payload format has reduced maximum values with respect to sizes and durations of (text) samples and sample descriptions. These maximum values differ from those allowed in 3GP files, where they are expressed using 32-bit (unsigned) integers. In some cases,
已针对RTP优化了有效负载标头的大小。与使用32位LEN、SDUR和SIDX头字段不同,减少这些字段的大小是一种设计选择,因为它们在很多时候都会携带许多未使用的位。因此,该有效负载格式降低了(文本)样本和样本描述的大小和持续时间的最大值。这些最大值与3GP文件中允许的值不同,3GP文件中的最大值使用32位(无符号)整数表示。在某些情况下,
extension mechanisms are provided to deal with larger values. However, it is noted that the values used here should be enough for the streaming applications targeted.
提供了扩展机制来处理较大的值。然而,需要注意的是,这里使用的值对于目标流应用程序来说应该足够了。
The following limitations apply:
以下限制适用:
1. The maximum size of text samples carried in RTP packets is restricted to be a 16-bit (unsigned) integer (this includes the text strings and modifiers). This means a maximum size for the unit would be about 64 Kbytes. No extension mechanism is provided.
1. RTP数据包中携带的文本样本的最大大小限制为16位(无符号)整数(包括文本字符串和修饰符)。这意味着该单元的最大大小约为64 KB。没有提供扩展机制。
2. The sample description index values are restricted to be an 8- bit (unsigned) integer. An extension mechanism is given in Section 4.3.
2. 示例描述索引值被限制为8位(无符号)整数。第4.3节给出了扩展机制。
3. The text sample duration is restricted to be a 24-bit (unsigned) integer. This yields a maximum duration at a timestamp clockrate of 1000 Hz of about 4.6 hours. Nevertheless, an extension mechanism is provided in Section 4.3.
3. 文本采样持续时间限制为24位(无符号)整数。这将在1000 Hz的时间戳时钟频率下产生约4.6小时的最大持续时间。然而,第4.3节提供了一种扩展机制。
4. Sample descriptions are also restricted in size: If the size cannot be expressed as a 16-bit (unsigned) integer, the sample description shall not be conveyed. As in the case of the sample size, no extension mechanism is provided.
4. 样本描述的大小也受到限制:如果大小不能表示为16位(无符号)整数,则不应传达样本描述。与样本量的情况一样,没有提供扩展机制。
5. A further limitation concerns the UTF-16 encodings supported: Only transport of text strings following big endian byte order is supported. See Section 4.1.1 for details.
5. 另一个限制涉及支持的UTF-16编码:只支持按照大端字节顺序传输文本字符串。详见第4.1.1节。
The following design choices were made:
作出了以下设计选择:
1. 'Unit' approach: The payload formats specified in this document follow a simple scheme: a 3-byte common header (Common Payload Header) followed by a specific header for each text sample (fragment) type. Following these headers, the text sample contents are placed (Section 4.1.1 and following). This structure is called a 'unit'.
1. “单位”方法:本文档中指定的有效负载格式遵循一个简单的方案:一个3字节的公共头(公共有效负载头),然后是每个文本样本(片段)类型的特定头。在这些标题之后,放置文本样本内容(第4.1.1节及以下)。这种结构称为“单元”。
The following units have been devised to comply with the requirements mentioned in Section 2.3:
以下装置的设计符合第2.3节中提到的要求:
a. A TYPE 1 unit that contains one complete text sample,
a. 包含一个完整文本样本的类型1单元,
b. A TYPE 2 unit that contains a complete text string or a fragment thereof,
b. 包含完整文本字符串或其片段的类型2单元,
c. A TYPE 3 unit that contains the complete modifiers or only the first fragment thereof,
c. 包含完整修改器或仅包含其第一个片段的3型装置,
d. A TYPE 4 unit that contains one modifier fragment other than the first, and
d. 包含除第一个修改器片段以外的一个修改器片段的类型4单元,以及
e. A TYPE 5 unit that contains one sample description.
e. 包含一个样本描述的5型装置。
This 'unit' approach was motivated by the following reasons:
这种“单元”方法的动机如下:
1. Allows a simple classification of the text samples and text sample fragments that can be conveyed by the payload format.
1. 允许对可通过有效负载格式传送的文本样本和文本样本片段进行简单分类。
2. Enables easy interoperability with RFC 3640 [12]. During the development of this payload format, interest was shown from MPEG-4 standardization participants in developing a common payload structure for the transport of 3GPP Timed Text. While interoperability is not strictly necessary for this payload format to work, it has been pursued in this payload format. Section 4.8 explains how this is done.
2. 实现与RFC 3640的轻松互操作性[12]。在该有效载荷格式的开发过程中,MPEG-4标准化参与者对开发用于3GPP定时文本传输的公共有效载荷结构表现出了兴趣。虽然互操作性对于这种有效负载格式的工作不是严格必要的,但它一直在这种有效负载格式中进行。第4.8节解释了如何做到这一点。
2. Character count is not implemented. This payload format does detect lost text samples fragments, but it does not enable an RTP receiver to find out the exact number of text characters lost. In fact, the fragment size included in the payload headers does not help in finding the number of lost characters because the UTF-8/UTF-16 [18][19] encodings used yield a variable number of bytes per character.
2. 未实现字符计数。这种有效负载格式确实可以检测丢失的文本样本片段,但它不能使RTP接收器找出丢失的文本字符的确切数量。事实上,有效负载头中包含的片段大小无助于查找丢失的字符数,因为使用的UTF-8/UTF-16[18][19]编码会产生每个字符的可变字节数。
For finding the exact number of lost characters, an additional field reflecting the character count (and possibly the character offset) upon fragmentation would be required. This would additionally require that the entity performing fragmentation count the characters included in each text fragment.
为了找到丢失字符的确切数量,需要一个额外的字段,反映碎片时的字符计数(可能还有字符偏移量)。这还需要执行分段的实体对每个文本片段中包含的字符进行计数。
One benefit of having a character count would be that the display application would be able to replace missing characters through some other character representing character loss. For example:
具有字符计数的一个好处是,显示应用程序能够通过表示字符丢失的其他字符替换丢失的字符。例如:
If we take the "Some text is lost now" and assume the loss of a packet containing the text in the middle, this could be displayed (with a character count):
如果我们使用“某些文本现在丢失”,并且假定在中间包含文本的分组丢失,则可以显示(以字符计数):
"Some ############now"
"Some ############now"
As opposed to:
与之相反:
"Some #now"
“一些#现在”
which is what this payload format enables ("#" indicates a missing character or packet, respectively).
这就是此有效负载格式启用的内容(“#”分别表示缺少字符或数据包)。
However, it is the consensus of the working group that for applications such as subtitling applications and multimedia presentations that use this payload format, such partial error correction is not worth the cost of including two additional fields; namely, character count and character offset. Instead, it is recommended that some more overhead be invested to provide full error correction by protecting the less text sample fragments using the measures outlined in Section 5.
然而,工作组的共识是,对于使用这种有效载荷格式的字幕应用程序和多媒体演示文稿等应用程序,这种部分纠错不值得增加两个字段;即字符计数和字符偏移量。相反,建议投入更多的开销,通过使用第5节中概述的措施保护较少的文本样本片段来提供完整的错误更正。
3. Fragment re-assembly: In order to re-assemble the text samples, offset information is needed. Instead of a character or byte offset, a single byte, TOTAL/THIS, is used. These two values indicate the total number and current index of fragments of a text sample. This is simpler than having a character offset field in each fragment. Details in Section 4.1.3.
3. 片段重新组装:为了重新组装文本样本,需要偏移量信息。使用单个字节TOTAL/THIS代替字符或字节偏移量。这两个值表示文本样本片段的总数和当前索引。这比在每个片段中使用字符偏移量字段更简单。详情见第4.1.3节。
4. A length field, LEN, is present in the common header fields. While the length in the RTP payload format is not needed by most RTP applications (typically lower layers, like UDP, provide this information), it does ease interoperability with RFC 3640. This is because the Access Units (AUs) used for carriage of data in RFC 3640 must include a length indication. Details are in Section 4.8.
4. 公共标头字段中存在长度字段LEN。虽然大多数RTP应用程序不需要RTP有效负载格式的长度(通常较低的层,如UDP,提供此信息),但它确实简化了与RFC 3640的互操作性。这是因为RFC 3640中用于数据传输的访问单元(AU)必须包括长度指示。详情见第4.8节。
5. The header fields in the specific payload headers (TYPE headers in Sections 4.1.2 to 4.1.6) have been arranged for easy processing on 32-bit machines. For this reason, the fields SIDX and SDUR are swapped in TYPE 1 unit, compared to the other units.
5. 特定有效负载报头中的报头字段(第4.1.2节至第4.1.6节中的类型报头)的安排便于在32位机器上处理。因此,与其他单元相比,SIDX和SDUR字段在类型1单元中交换。
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [5].
本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照RFC 2119[5]中所述进行解释。
Furthermore, the following terms are used and have specific meaning within the context of this document:
此外,以下术语在本文件中使用并具有特定含义:
text sample or whole text sample
文本样本还是全文样本
In the 3GPP Timed Text media format [1], these terms refer to a unit of timed text data as contained in the source (3GP) file. This includes the text string byte count, possibly a Byte Order Mark, the text string and any modifiers that may follow. Its equivalent in audio/video would be a frame.
在3GPP定时文本媒体格式[1]中,这些术语指源(3GP)文件中包含的定时文本数据单元。这包括文本字符串字节计数、可能的字节顺序标记、文本字符串和随后可能出现的任何修饰符。它在音频/视频中的等价物是一帧。
In this document, however, a text sample contains only text strings followed by zero or more modifiers. This definition of text sample excludes the 16-bit text string byte count and the 16-bit Byte Order Mark (BOM) present in 3GP file text samples (see Section 4.3 and Figure 9). The 16-bit BOM is not transported in RTP, as explained in Section 4.1.1.
但是,在本文档中,文本示例仅包含后跟零个或多个修饰符的文本字符串。文本样本的定义不包括3GP文件文本样本中存在的16位文本字符串字节计数和16位字节顺序标记(BOM)(参见第4.3节和图9)。如第4.1.1节所述,16位BOM不以RTP方式传输。
text strings
文本串
The actual text characters encoded either as UTF-8 or UTF-16. When using this payload format, the text string does not contain any byte order mark (BOM). See Figure 9 for details.
编码为UTF-8或UTF-16的实际文本字符。使用此有效负载格式时,文本字符串不包含任何字节顺序标记(BOM)。有关详细信息,请参见图9。
fragment or text sample fragment
片段或文本示例片段
A fraction of a text sample. A fragment may contain either text strings or modifier (decoration) contents, but not both at the same time.
文本样本的一小部分。片段可以包含文本字符串或修饰符(装饰)内容,但不能同时包含两者。
sample contents
样本内容
General term to identify timed text data transported when using this payload format. Sample contents may be one or several text samples, sample descriptions, and sample fragments (note that, as per Section 4.6, there is only one case in which more than one fragment may be included in a payload).
通用术语,用于标识使用此有效负载格式时传输的定时文本数据。样本内容可以是一个或多个文本样本、样本描述和样本片段(注意,根据第4.6节,只有一种情况下一个有效载荷中可能包含多个片段)。
decoration or modifiers
装饰或修饰
These terms are used interchangeably throughout the document to denote the contents of the text sample that modify the default text formatting. Modifiers may, for example, specify different font size for a particular sequence of characters or define karaoke timing for the sample.
这些术语在整个文档中互换使用,以表示修改默认文本格式的文本示例的内容。例如,修饰符可以为特定的字符序列指定不同的字体大小,或者为示例定义卡拉OK计时。
sample description
样本描述
Information that is potentially shared by more than one text sample. In a 3GP file, a sample description is stored in a place where it can be shared. It contains setup and default information such as scrolling direction, text box position, delay value, default font, background color, etc.
可能由多个文本示例共享的信息。在3GP文件中,样本描述存储在可共享的位置。它包含设置和默认信息,如滚动方向、文本框位置、延迟值、默认字体、背景色等。
units or transport units
单位或运输单位
The payload headers specified in this document encapsulate text samples, fragments thereof, and sample descriptions by placing a common header and specific payload header (Sections 4.1.1 to 4.1.6) before them, thus building what is here called a (transport) unit.
本文件中规定的有效载荷标题通过将公共标题和特定有效载荷标题(第4.1.1节至第4.1.6节)放在文本样本、其片段和样本描述之前来封装文本样本、其片段和样本描述,从而构建此处所称的(传输)单元。
aggregation or aggregate packet
聚合或聚合数据包
The payload of an aggregate (RTP) packet consists of several (transport) units.
聚合(RTP)数据包的有效负载由几个(传输)单元组成。
track or stream
跟踪或流
3GP files contain audio/video and text tracks. This document enables streaming of text tracks using RTP. Therefore, these terms are used interchangeably in this document in the context of 3GP files.
3GP文件包含音频/视频和文本曲目。本文档支持使用RTP对文本曲目进行流式处理。因此,这些术语在本文档中可在3GP文件的上下文中互换使用。
Media Header Box / Track Header Box / ...
媒体标题框/曲目标题框/。。。
The 3GP file format makes use of these structures defined in the ISO Base File Format [2]. When referring to these in this document, initials are capitalized for clarity.
3GP文件格式使用ISO基本文件格式[2]中定义的这些结构。在本文件中提及这些时,为了清晰起见,首字母大写。
The format of an RTP packet containing 3GPP timed text is shown below:
包含3GPP定时文本的RTP分组的格式如下所示:
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | /+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |U| R | TYPE| LEN | : | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : U| : (variable header fields depending on TYPE : N| : : I< +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ T| | | | : SAMPLE CONTENTS : | | +-+-+-+-+-+-+-+-+ | | | \+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | /+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |U| R | TYPE| LEN | : | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : U| : (variable header fields depending on TYPE : N| : : I< +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ T| | | | : SAMPLE CONTENTS : | | +-+-+-+-+-+-+-+-+ | | | \+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1. 3GPP Timed Text RTP Packet Format
图1。3GPP定时文本RTP分组格式
Marker bit (M): The marker bit SHALL be set to 1 if the RTP packet includes one or more whole text samples or the last fragment of a text sample; otherwise, it is set to zero (0).
标记位(M):如果RTP数据包包含一个或多个全文样本或文本样本的最后一个片段,则标记位应设置为1;否则,将其设置为零(0)。
Timestamp: The timestamp MUST indicate the sampling instant of the earliest (or only) unit contained in the RTP packet. The initial value SHOULD be randomly determined, as specified in RTP [3].
时间戳:时间戳必须指示RTP数据包中包含的最早(或唯一)单元的采样时刻。初始值应按照RTP[3]的规定随机确定。
The timestamp value should provide enough timing resolution for expressing the duration of text samples, for synchronizing text with other media, and for performing RTP Control Protocol (RTCP) measurements such as the interarrival delay jitter or the RTCP Packet Receipt Times Report Block (Section 4.3 of RFC 3611 [20]). This is compliant to RTP, Section 5.1:
时间戳值应提供足够的时间分辨率,用于表示文本样本的持续时间,用于与其他媒体同步文本,以及用于执行RTP控制协议(RTCP)测量,如到达间隔延迟抖动或RTCP数据包接收时间报告块(RFC 3611[20]第4.3节)。这符合RTP第5.1节:
"The resolution of the clock MUST be sufficient for the desired synchronization accuracy and for measuring packet arrival jitter (one tick per video frame is typically not sufficient)".
“时钟分辨率必须足以达到所需的同步精度和测量数据包到达抖动(每个视频帧一个刻度通常是不够的)”。
The above observation applies to both timed text tracks included in a 3GP file and live streaming sessions. In the case of a 3GP timed text track, the timestamp clockrate is the value of the "timescale" parameter in the Media Header Box for that text track. Each track in a 3GP file MAY have its own clockrate as specified in the Media Header Box. Likewise, live streaming applications SHALL use an appropriate timestamp clockrate. A default value of 1000 Hz is RECOMMENDED. Other timestamp clockrates MAY be used. In this case, the typical behavior here is to match the 3GPP timed text clockrate to that used by an associated audio or video stream.
上述观察结果适用于3GP文件中包含的定时文本曲目和直播会话。在3GP定时文本曲目的情况下,timestamp clockrate是该文本曲目的媒体标题框中“timescale”参数的值。3GP文件中的每个磁道都有自己的时钟速率,如媒体头框中所指定。同样,直播应用程序应使用适当的时间戳时钟速率。建议使用1000 Hz的默认值。可以使用其他时间戳时钟速率。在这种情况下,这里的典型行为是将3GPP定时文本时钟速率与相关音频或视频流使用的时钟速率相匹配。
In an aggregate payload, units MUST be placed in play-out order, i.e., earliest first in the payload. If TYPE 1 units are aggregated, the timestamp of the subsequent units MUST be obtained by adding the timed text sample duration of previous samples to the RTP timestamp value. There are two exceptions to this rule: TYPE 5 units and an aggregate payload containing two fragments of the same text sample. The details of the timestamp calculation are given in Section 4.6.
在聚合有效载荷中,单元必须按播放顺序放置,即有效载荷中最早的第一个。如果聚合类型1单元,则必须通过将先前样本的定时文本样本持续时间添加到RTP时间戳值来获得后续单元的时间戳。此规则有两个例外:类型5单元和包含同一文本样本的两个片段的聚合负载。第4.6节给出了时间戳计算的详细信息。
Finally, timestamp clockrates MUST be signaled by out-of-band means at session setup, e.g., using the media type "rate" parameter in SDP. See Section 9 for details.
最后,时间戳时钟速率必须在会话设置时通过带外方式发出信号,例如,使用SDP中的媒体类型“速率”参数。详情见第9节。
Payload Type (PT): The payload type is set dynamically and sent by out-of-band means.
有效负载类型(PT):动态设置有效负载类型,并通过带外方式发送。
The usage of the remaining RTP header fields (namely, V, P, X, CC, SN and SSRC) follows the rules of RTP and the profile in use.
其余RTP头字段(即V、P、X、CC、SN和SSRC)的使用遵循RTP规则和使用中的配置文件。
The (transport) units specified in this document consist of a set of common fields (U, R, TYPE, LEN), followed by specific header fields (TYPES 1-5) and text sample contents. See Figure 1 and Figure 2.
本文档中指定的(传输)单元由一组公共字段(U、R、TYPE、LEN)组成,后跟特定的标题字段(类型1-5)和文本样本内容。请参见图1和图2。
In Figure 2, two example RTP packets are depicted. The first contains an aggregate RTP payload with two complete text samples, and the second contains one text sample fragment. After each unit header is explained, detailed payload examples follow in Section 4.7.
在图2中,描述了两个示例RTP数据包。第一个包含一个包含两个完整文本样本的聚合RTP负载,第二个包含一个文本样本片段。在解释了每个单元收割台之后,第4.7节给出了详细的有效载荷示例。
+----------------------+ | | | RTP Header | | | ---------+----------------------+ | | | | |COMMON + TYPE 1 Header| | ........................ UNIT 1 - | | | | Text Sample | | | | |-------\........................ -------/| | | |COMMON + TYPE 1 Header| | ........................ UNIT 2 - | | | | Text Sample | | | | | | | ---------+----------------------+
+----------------------+ | | | RTP Header | | | ---------+----------------------+ | | | | |COMMON + TYPE 1 Header| | ........................ UNIT 1 - | | | | Text Sample | | | | |-------\........................ -------/| | | |COMMON + TYPE 1 Header| | ........................ UNIT 2 - | | | | Text Sample | | | | | | | ---------+----------------------+
+----------------------+ | | | RTP Header | | | ---------+----------------------+ | | COMMON + TYPE 2 | | | (or 3 or 4) Hdr | | ........................ UNIT 3 - | | | | Text Sample Fragment | | | | | | | ---------+----------------------+
+----------------------+ | | | RTP Header | | | ---------+----------------------+ | | COMMON + TYPE 2 | | | (or 3 or 4) Hdr | | ........................ UNIT 3 - | | | | Text Sample Fragment | | | | | | | ---------+----------------------+
Figure 2. Example RTP packets
图2。示例RTP数据包
The fields common to all payload headers have the following format:
所有有效负载标头共用的字段具有以下格式:
0 1 2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE | LEN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE | LEN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3. Common payload header fields
图3。公共有效负载头字段
Where:
哪里:
o U (1 bit) "UTF Transformation flag": This is used to inform RTP receivers whether UTF-8 (U=0) or UTF-16 (U=1) was used to encode the text string. UTF-16 text strings transported by this payload format MUST be serialized in big endian order, a.k.a. network byte order.
o U(1位)“UTF转换标志”:用于通知RTP接收器是否使用UTF-8(U=0)或UTF-16(U=1)对文本字符串进行编码。此有效负载格式传输的UTF-16文本字符串必须以大端顺序(又称网络字节顺序)序列化。
Informative note: Timed text clients complying with the 3GPP Timed Text format [1] are only required to understand the big endian serialization. Thus, in order to ease interoperability, the reverse serialization (little endian) is not supported by this payload format.
资料性说明:符合3GPP定时文本格式[1]的定时文本客户端只需理解big-endian序列化即可。因此,为了简化互操作性,此有效负载格式不支持反向序列化(little endian)。
For the payload formats defined in this document, the U bit is only used in TYPE 1 and TYPE 2 headers. Senders MUST set the U bit to zero in TYPE 3, TYPE 4, and TYPE 5 headers. Consequently, receivers MUST ignore the U bit in TYPE 3, TYPE 4, and TYPE 5 headers.
对于本文档中定义的有效负载格式,U位仅用于类型1和类型2标头。发件人必须在类型3、类型4和类型5标头中将U位设置为零。因此,接收器必须忽略类型3、类型4和类型5报头中的U位。
o R (4 bits) "Reserved bits": for future extensions. This field MUST be set to zero (0x0) and MUST be ignored by receivers.
o R(4位)“保留位”:用于将来的扩展。此字段必须设置为零(0x0),并且必须被接收器忽略。
o TYPE (3 bits) "Type Field": This field specifies which specific header fields follow. The following TYPE values are defined:
o 类型(3位)“类型字段”:此字段指定后面的特定标题字段。定义了以下类型值:
- TYPE 1, for a whole text sample. - TYPE 2, for a text string fragment (without modifiers). - TYPE 3, for a whole modifier box or the first fragment of a modifier box. - TYPE 4, for a modifier fragment other than first. - TYPE 5, for a sample description. Exactly one header per sample description. - TYPE 0, 6, and 7 are reserved for future extensions. Note that future extensions are possible, e.g., a unit that explicitly signals the number of characters present in a
- 类型1,用于全文样本。-类型2,用于文本字符串片段(不带修饰符)。-类型3,用于整个修改器框或修改器框的第一个片段。-类型4,用于除第一个之外的修改器片段。-类型5,用于示例说明。每个样本描述只有一个标题。-类型0、6和7保留用于将来的扩展。请注意,将来的扩展是可能的,例如,一个单元,它显式地表示一个表中存在的字符数
fragment (see Section 2.5). In order to guarantee backwards-compatibility, it SHALL be possible that older clients ignore (newer) units they do not understand, without invalidating the timestamp calculation mechanisms or otherwise preventing them from decoding the other units.
碎片(见第2.5节)。为了保证向后兼容性,老客户机可以忽略他们不理解的(较新的)单元,而不会使时间戳计算机制失效或以其他方式阻止他们解码其他单元。
o Finally, the LEN (16 bits) "Length Field": indicates the size (in bytes) of this header field and all the fields following, i.e., the LEN field followed by the unit payload: text strings and modifiers (if any). This definition only excludes the initial U/R/TYPE byte of the common header. The LEN field follows network byte order.
o 最后,LEN(16位)“长度字段”:表示此标题字段和后面所有字段的大小(以字节为单位),即LEN字段后跟单位有效负载:文本字符串和修饰符(如果有)。此定义仅排除公共标头的初始U/R/类型字节。LEN字段遵循网络字节顺序。
The way in which LEN is obtained when streaming out of a 3GP file depends on the particular unit type. This is explained for each unit in the sections below.
从3GP文件中流出时获取LEN的方式取决于特定的单元类型。以下各节将对每个单元进行说明。
For live streaming, both sample length and the LEN value for the current fragment MUST be calculated during the sampling process or during fragmentation.
对于直播,必须在采样过程或片段期间计算当前片段的样本长度和LEN值。
In general, LEN may take the following values:
通常,LEN可采用以下值:
- TYPE = 1, LEN >= 8 - TYPE = 2, LEN > 9 - TYPE = 3, LEN > 6 - TYPE = 4, LEN > 6 - TYPE = 5, LEN > 3
- 类型=1,LEN>=8-类型=2,LEN>9-类型=3,LEN>6-类型=4,LEN>6-类型=5,LEN>3
Receivers MUST discard units that do not comply with these values. However, the RTP header fields and the rest of the units in the payload (if any) are still useful, as guaranteed by the requirement for future extensions above.
接收器必须丢弃不符合这些值的单元。但是,RTP报头字段和有效负载中的其余单元(如果有)仍然有用,这是由上述未来扩展的要求所保证的。
In the following subsections the different payload headers for the values of TYPE are specified.
在以下小节中,指定了类型值的不同有效负载标头。
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE | LEN (always >=8) | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | TLEN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TLEN | +-+-+-+-+-+-+-+-+
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE | LEN (always >=8) | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | TLEN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TLEN | +-+-+-+-+-+-+-+-+
Figure 4. TYPE 1 Header Format
图4。类型1标题格式
This header type is used to transport whole text samples. This unit should be the most common case, i.e., the text sample should usually be small enough to be transported in one unit without having to separate text strings from modifiers. In an aggregate (RTP packet) payload containing several text samples, every sample is preceded by its own TYPE 1 header (see Figure 12).
此标题类型用于传输全文样本。此单元应该是最常见的情况,即文本样本通常应该足够小,可以在一个单元中传输,而无需将文本字符串与修饰符分开。在包含多个文本样本的聚合(RTP数据包)负载中,每个样本前面都有自己的Type1报头(见图12)。
Informative note: As indicated in Section 3, "Terminology", a text sample is composed of the text strings followed by the modifiers (if any). This is also how text samples are stored in 3GP files. The separation of a text sample into text strings and modifiers is only needed for large samples (or small available IP MTU sizes; see Section 4.4), and it is accomplished with TYPE 2 and TYPE 3 headers, as explained in the sections below.
资料性说明:如第3节“术语”所述,文本示例由后跟修饰符(如有)的文本字符串组成。这也是文本样本存储在3GP文件中的方式。只有大样本(或小的可用IP MTU大小;见第4.4节)才需要将文本样本分离为文本字符串和修饰符,并使用类型2和类型3标题完成,如下节所述。
Note also that empty text samples are considered whole text samples, although they do not contain sample contents. Empty text samples may be used to clear the display or to put an end to samples of unknown duration, for example. Units without sample contents SHALL have a LEN field value of 8 (0x0008).
还请注意,空文本示例被视为整文本示例,尽管它们不包含示例内容。例如,空文本样本可用于清除显示或结束未知持续时间的样本。无样本内容的单元的LEN字段值应为8(0x0008)。
The fields above have the following meaning:
上述字段具有以下含义:
o U, R, and TYPE, as defined in Section 4.1.1.
o U、 R和类型,如第4.1.1节所定义。
o LEN, in this case, represents the length of the (complete) text sample plus eight (8) bytes of headers. For finding the length of the text sample in the Sample Size Box of 3GP files, see Section 4.3.
o 在本例中,LEN表示(完整)文本样本的长度加上八(8)个字节的标题。有关在3GP文件的样本大小框中查找文本样本的长度,请参见第4.3节。
o SIDX (8 bits) "Text Sample Entry Index": This is an index used to identify the sample descriptions.
o SIDX(8位)“文本样本条目索引”:这是用于识别样本描述的索引。
The SIDX field is used to find the sample description corresponding to the unit's payload. There are two types of SIDX values: static and dynamic.
SIDX字段用于查找与机组有效载荷相对应的样本描述。SIDX值有两种类型:静态和动态。
Static SIDX values are used to identify sample descriptions that MUST be sent out-of-band and MUST remain active during the whole session. A static SIDX value is unequivocally linked to one particular sample description during the whole session. Carrying many sample descriptions out-of-band SHOULD be avoided, since these may become large and, ultimately, transport is not the goal of the out-of-band channel. Thus, this feature is RECOMMENDED for transporting those sample descriptions that provide a set of minimum default format settings. Static SIDX values MUST fall in the (closed) interval [129,254].
静态SIDX值用于标识必须发送到带外并且必须在整个会话期间保持活动状态的示例描述。在整个会话期间,静态SIDX值与一个特定的示例描述明确关联。应避免在带外携带许多示例描述,因为这些示例描述可能会变大,并且最终传输不是带外通道的目标。因此,建议将此功能用于传输提供一组最小默认格式设置的示例描述。静态SIDX值必须在(闭合)间隔[129254]内。
Dynamic SIDX values are used for sample descriptions sent in-band. Sample descriptions MAY be sent in-band for several reasons: because they are generated in real time, for transport resiliency, or both. A dynamic SIDX value is unequivocally linked to one particular sample description during the period in which this is active in the session, and it SHALL NOT be modified during that period. This period MAY be smaller than or equal to the session duration. This period is not known a priori. A maximum of 64 dynamic simultaneously active SIDX values is allowed at any moment. Dynamic SIDX values MUST fall in the closed interval [0,127]. This should be enough for both recorded content and live streaming applications. Nevertheless, a wraparound mechanism is provided in Section 4.2.1 to handle streaming sessions where more than 64 SIDX values might be needed. Servers MAY make use of dynamic sample descriptions. Clients MUST be able to receive and interpret dynamic sample descriptions.
动态SIDX值用于带内发送的样本描述。示例说明可以带内发送,原因有几个:因为它们是实时生成的,或者是为了传输弹性,或者两者兼而有之。在会话期间,动态SIDX值与一个特定样本描述明确关联,在此期间不得修改。此时间段可能小于或等于会话持续时间。这个时期是未知的。在任何时刻,最多允许64个动态同时激活的SIDX值。动态SIDX值必须在闭合区间[0127]内。对于录制内容和流媒体直播应用程序来说,这应该足够了。尽管如此,第4.2.1节中提供了一种概括机制来处理可能需要超过64个SIDX值的流式会话。服务器可以使用动态示例描述。客户必须能够接收和解释动态样本描述。
Finally, SIDX values 128 and 255 are reserved for future use.
最后,SIDX值128和255保留供将来使用。
o SDUR (24 bits) "Text Sample Duration": indicates the sample duration in RTP timestamp units of the text sample. For this field, a length of 3 bytes is preferred to 2 bytes. This is because, for a typical clockrate of 1000 Hz, 16 bits would allow for a maximum duration of just 65 seconds, which might be too short for some streams. On the other hand, 24 bits at 1000 Hz allow for a maximum duration of about 4.6 hours, while for 90 KHz, this value is about 3 minutes. These values should be enough for streaming applications. However, if a larger duration is needed, the extension mechanism specified in Section 4.3 SHALL be used.
o SDUR(24位)“文本样本持续时间”:表示文本样本的RTP时间戳单位的样本持续时间。对于该字段,3字节的长度优先于2字节的长度。这是因为,对于1000 Hz的典型时钟频率,16位允许的最大持续时间仅为65秒,这对于某些流来说可能太短了。另一方面,1000 Hz下的24位允许最大持续时间约为4.6小时,而对于90 KHz,该值约为3分钟。这些值对于流式应用程序应该足够了。但是,如果需要更长的持续时间,则应使用第4.3节中规定的延长机制。
Apart from defining the time period during which the text is displayed, the duration field is also used to find the timestamp of subsequent units within the aggregate RTP packet payload (if any).
除了定义文本显示的时间段外,duration字段还用于查找聚合RTP数据包有效负载(如果有)中后续单元的时间戳。
This is explained in Section 4.6.
第4.6节对此进行了解释。
Text samples have generally a known duration at the time of transmission. However, in some cases such as live streaming, the time for which a text piece shall be presented might not be known a priori. Thus, the value zero SDUR=0 (0x000000) is reserved to signal unknown duration. The amount of time that a sample of unknown duration is presented is determined by the timestamp of the next sample that shall be displayed at the receiver: Text samples of unknown duration SHALL be displayed until the next text sample becomes active, as indicated by its timestamp.
文本样本在传输时通常具有已知的持续时间。然而,在某些情况下,例如直播,文本片段的呈现时间可能无法事先知道。因此,值zero SDUR=0(0x000000)被保留以表示未知持续时间。呈现未知持续时间的样本的时间量由下一个样本的时间戳确定,该时间戳应显示在接收器上:未知持续时间的文本样本应显示,直到下一个文本样本激活,如其时间戳所示。
The next example illustrates how units of unknown duration MUST be presented. If no text sample following is available, it is an implementation issue what should be displayed. For example, a server could send an empty sample to clear the text box.
下一个示例说明了必须如何显示未知持续时间的单位。如果没有以下文本示例,那么应该显示什么是实现问题。例如,服务器可以发送空样本以清除文本框。
Example: Imagine you are in an airport watching the latest news report while you wait for your plane. Airports are loud, so the news report is transcribed in the lower area of the screen. This area displays two lines of text: the headlines and the words spoken by the news speaker. As usual, the headlines are shown for a longer time than the rest. This time is, in principle, unknown to the stream server, which is streaming live. A headline is just replaced when the next headline is received.
例如:假设你在机场等待飞机时正在观看最新的新闻报道。机场的声音很大,所以新闻报道被转录在屏幕的下方。此区域显示两行文本:标题和新闻发言人所说的话。和往常一样,标题显示的时间比其他标题长。原则上,这一次对于流服务器来说是未知的,流服务器是实时的。当收到下一个标题时,标题将被替换。
However, upon storing a text sample with SDUR=0 in a 3GP file, the SDUR value MUST be changed to the effective duration of the text sample, which MUST be always greater than zero (note that the ISO file format [2] explicitly forbids a sample duration of zero). The effective duration MUST be calculated as the timestamp difference between the current sample (with unknown duration) and the next text sample that is displayed.
但是,在3GP文件中存储SDUR=0的文本样本时,必须将SDUR值更改为文本样本的有效持续时间,该持续时间必须始终大于零(请注意,ISO文件格式[2]明确禁止样本持续时间为零)。有效持续时间必须计算为当前样本(持续时间未知)和显示的下一个文本样本之间的时间戳差。
Note that samples of unknown duration SHALL NOT use features, which require knowledge of the duration of the sample up front. Such features are scrolling and karaoke in [1]. This also applies for future extensions of the Timed Text format. Furthermore, only sample descriptions (TYPE 5 units) MAY follow units of unknown duration in the same aggregate payload. Otherwise, it would not be possible to calculate the timestamp of these other units.
请注意,持续时间未知的样本不得使用需要事先了解样本持续时间的特征。在[1]中,这些功能包括滚动和卡拉OK。这也适用于定时文本格式的未来扩展。此外,在相同的聚合有效负载中,只有样本描述(类型5单元)可以跟随未知持续时间的单元。否则,将无法计算这些其他单元的时间戳。
For text contents stored in 3GP files, see Section 4.3 for details on how to extract the duration value. For live streaming, live encoders SHALL assign appropriate values and units according to [1] and later releases.
有关3GP文件中存储的文本内容,请参阅第4.3节,了解如何提取持续时间值的详细信息。对于直播,直播编码器应根据[1]和更高版本分配适当的值和单位。
o TLEN (16 bits), "Text String Length", is a byte count of the text string. The decoder needs the text string length in order to know where the modifiers in the payload start. TLEN is not present in text string fragments (TYPE 2) since it can be deductively calculated from the LEN values of each fragment.
o TLEN(16位),“文本字符串长度”,是文本字符串的字节计数。解码器需要文本字符串长度,以便知道有效负载中的修饰符从何处开始。文本字符串片段(类型2)中不存在TLEN,因为它可以根据每个片段的LEN值进行演绎计算。
The TLEN value is obtained from the text samples as contained in 3GP files. Refer to Section 4.3. For live content, the TLEN MUST be obtained during the sampling process.
TLEN值是从3GP文件中包含的文本样本中获得的。参考第4.3节。对于实时内容,必须在采样过程中获取TLEN。
o Finally, the actual text sample is placed after the TLEN field. As defined in Section 3, a text sample consists of a string of characters encoded using either UTF-8 or UTF-16, followed by zero or more modifiers. Note also that no BOM and no byte count are included in the strings carried in the payload (as opposed to text samples stored in 3GP files [1]).
o 最后,将实际文本样本放置在TLEN字段之后。如第3节所定义,文本示例由使用UTF-8或UTF-16编码的字符串组成,后跟零个或多个修饰符。还请注意,有效负载中携带的字符串中不包含BOM和字节计数(与3GP文件[1]中存储的文本样本相反)。
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE | LEN( always >9) | TOTAL | THIS | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SLEN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE | LEN( always >9) | TOTAL | THIS | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SLEN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 5. TYPE 2 Header Format
图5。类型2标题格式
This header type is used to transport either a whole text string or a fragment of it. TYPE 2 units SHALL NOT contain modifiers. In detail:
此标头类型用于传输整个文本字符串或其片段。2类装置不应包含改进剂。详细内容:
o U, R, and TYPE, as defined in Section 4.1.1.
o U、 R和类型,如第4.1.1节所定义。
o SIDX and SDUR, as defined in Section 4.1.2.
o 第4.1.2节中定义的SIDX和SDUR。
Note that the U, SIDX, and SDUR fields are meaningful since partial text strings can also be displayed.
请注意,U、SIDX和SDUR字段是有意义的,因为也可以显示部分文本字符串。
o The LEN field (16 bits) indicates the length of the text string fragment plus nine (9) bytes of headers. Its value is calculated upon fragmentation. LEN MUST always be greater than nine (0x0009). Otherwise, the unit MUST be discarded.
o LEN字段(16位)表示文本字符串片段的长度加上九(9)个字节的标题。其价值是根据碎片计算的。LEN必须始终大于九(0x0009)。否则,必须丢弃该装置。
According to the guidelines in Section 4.4, text strings MUST be split at character boundaries for allowing the display of text fragments. Therefore, a text fragment MUST contain at least one character in either UTF-8 or UTF-16. Actually, this is just a formalism since by observing the guidelines, much larger fragments should be created.
根据第4.4节中的指南,文本字符串必须在字符边界处拆分,以允许显示文本片段。因此,文本片段必须至少包含一个UTF-8或UTF-16字符。实际上,这只是一种形式主义,因为通过遵守指导原则,应该创建更大的片段。
Note also that TYPE 2 units do not contain an explicit text string length, TLEN (see TYPE 1). This is because TYPE 2 units do not contain any modifiers after the text string. If needed, the length of the received string can be obtained using the LEN values of the TYPE 2 units.
还请注意,类型2单元不包含显式文本字符串长度TLEN(请参见类型1)。这是因为类型2单位在文本字符串后不包含任何修饰符。如果需要,可以使用类型2单位的LEN值获得接收字符串的长度。
o The SLEN field (16 bits) indicates the size (in bytes) of the original (whole) text sample to which this fragment belongs. This length comprises the text string plus any modifier boxes present (and includes neither the byte order mark nor the text string length as mentioned in Section 3, "Terminology").
o SLEN字段(16位)表示该片段所属的原始(完整)文本样本的大小(字节)。该长度包括文本字符串加上存在的任何修改器框(既不包括字节顺序标记,也不包括第3节“术语”中提到的文本字符串长度)。
Regarding the text sample length: Timed text samples are not generated at regular intervals, nor is there a default sample size. If 3GP files are streamed, the length of the text samples is calculated beforehand and included in the track itself, while for live encoding it is the real time encoder that SHALL choose an appropriate size for each text sample. In this case, the amount of text 'captured' in a sample depends on the text source and the particular application (see examples below). Samples may, e.g., be tailored to match the packet MTU as closely as possible or to provide a given redundancy for the available bit rate. The encoding application MUST also take into account the delay constraints of the real-time session and assess whether FEC, retransmission, or other similar techniques are reasonable options for stream repair.
关于文本样本长度:不定期生成定时文本样本,也没有默认样本大小。如果3GP文件是流式传输的,则事先计算文本样本的长度并将其包含在曲目中,而对于实时编码,则实时编码器应为每个文本样本选择适当的大小。在这种情况下,示例中“捕获”的文本量取决于文本源和特定应用程序(参见下面的示例)。例如,可以对样本进行定制以尽可能接近地匹配分组MTU,或者为可用比特率提供给定冗余。编码应用程序还必须考虑实时会话的延迟约束,并评估FEC、重传或其他类似技术是否是流修复的合理选择。
The following examples shall illustrate how a real-time encoder may choose its settings to adapt to the scenario constraints.
以下示例应说明实时编码器如何选择其设置以适应场景约束。
Example: Imagine a newscast scenario, where the spoken news is transcribed and synchronized with the image and voice of the reporter. We assume that the news speaker talks at an average speed of 5 words per second with an average word length of 5 characters plus one space per word, i.e., 30 characters per second. We assume an available IP MTU of 576 bytes and an available bitrate of 576*8 bits per second = 4.6 Kbps. We assume each character can be encoded using 2 bytes in UTF-16. In this scenario, several constraints may apply; for example: available IP MTU, available bandwidth, allowable delay, and required redundancy. If the target were to minimize the
示例:想象一个新闻广播场景,其中口头新闻被转录并与记者的图像和声音同步。我们假设新闻发言人以每秒5个字的平均速度说话,平均字长为5个字符加上每个字一个空格,即每秒30个字符。我们假设可用IP MTU为576字节,可用比特率为576*8比特/秒=4.6 Kbps。我们假设每个字符可以在UTF-16中使用2个字节进行编码。在这种情况下,可能会应用多个约束条件;例如:可用IP MTU、可用带宽、允许延迟和所需冗余。如果目标是最小化
packet overhead, a text sample covering 8 seconds of text would be closest to the IP MTU:
数据包开销,覆盖8秒文本的文本样本最接近IP MTU:
IP/UDP/RTP/TYPE1 Header + (8-second text sample) = 20 + 8 + 12 + 8 + (~6 chars/word * 5 word/s * 8 s * 2 chars/word) = 528 bytes < 576 bytes
IP/UDP/RTP/TYPE1 Header + (8-second text sample) = 20 + 8 + 12 + 8 + (~6 chars/word * 5 word/s * 8 s * 2 chars/word) = 528 bytes < 576 bytes
For other scenarios, like lossy networks, it may happen that just one packet per sample is too low a redundancy. In this case, a choice could be that the encoder 'collects' text every second, thus yielding text samples (TYPE 1 units) of 68 bytes, TYPE 1 header included. We can, e.g., include three contiguous text samples in one RTP payload: the current and last two text samples (see below). This accounts to a total IP packet size of 20 + 8 + 12 + 3*(8 + 60) = 244 bytes. Now, with the same available bitrate of 4.6 Kbps, these 244-byte packets can be sent redundantly up two times per second:
对于其他场景,如有损网络,每个样本只有一个数据包的冗余度可能太低。在这种情况下,一种选择是编码器每秒“收集”文本,从而产生68字节的文本样本(类型1单元),包括类型1头。例如,我们可以在一个RTP负载中包含三个连续的文本样本:当前和最后两个文本样本(见下文)。这说明IP数据包的总大小为20+8+12+3*(8+60)=244字节。现在,在4.6 Kbps的相同可用比特率下,这些244字节的数据包可以每秒冗余发送两次:
RTP payload (1,2,3)(1,2,3) (2,3,4)(2,3,4) (3,4,5)(3,4,5) ... Time: <----1s------> <----1s------> <-----1s-----> ...
RTP payload (1,2,3)(1,2,3) (2,3,4)(2,3,4) (3,4,5)(3,4,5) ... Time: <----1s------> <----1s------> <-----1s-----> ...
This means that each text sample is sent at least six times, which should provide enough redundancy. Although not as bandwidth efficient (488*8 < 528*8 < 576*8 bps) as the previous packetization, this option increases the stream redundancy while still meeting the delay and bandwidth constraints.
这意味着每个文本样本至少发送六次,这将提供足够的冗余。虽然不像以前的打包那样具有带宽效率(488*8<528*8<576*8 bps),但此选项在满足延迟和带宽限制的同时增加了流冗余。
Another example would be a user sending timed text from a type-in area in the display. In this case, the text sample is created as soon as the user clicks the 'send' button. Depending on the packet length, fragmentation may be needed.
另一个例子是用户从显示器的输入区域发送定时文本。在这种情况下,只要用户单击“发送”按钮,就会创建文本示例。根据数据包长度,可能需要分段。
In a video conferencing application, text is synchronized with audio and video. Thus, the text samples shall be displayed long enough to be read by a human, shall fit in the video screen, and shall 'capture' the audio contents rendered during the time the corresponding video and audio is rendered.
在视频会议应用程序中,文本与音频和视频同步。因此,文本样本应显示足够长的时间,以供人阅读,应适合视频屏幕,并应“捕获”在相应视频和音频呈现期间呈现的音频内容。
For stored content, see Section 4.3 for details on how to find the SLEN value in a 3GP file. For live content, the SLEN MUST be obtained during the sampling process.
有关存储内容,请参阅第4.3节,了解如何在3GP文件中查找SLEN值的详细信息。对于实时内容,必须在采样过程中获取SLEN。
Finally, note that clients MAY use SLEN to buffer space for the remaining fragments of a text sample.
最后,请注意,客户端可能使用SLEN为文本样本的剩余片段缓冲空间。
o The fields TOTAL (4 bits) and THIS (4 bits) indicate the total number of fragments in which the original text sample (i.e., the
o 字段TOTAL(4位)和THIS(4位)表示原始文本样本(即
text string and its modifiers) has been fragmented and which order occupies the current fragment in that sequence, respectively. Note that the sequence number alone cannot replace the functionality of the THIS field, since packets (and fragments) may be repeated, e.g., as in repeated transmission (see Section 5). Thus, an indication for "fragment offset" is needed.
文本字符串及其修饰符)已被分段,且顺序分别占据该序列中的当前片段。注意,序列号本身不能代替该字段的功能,因为包(和片段)可以重复,例如,在重复传输中(参见第5节)。因此,需要“碎片偏移”的指示。
The usual "byte offset" field is not used here for two reasons: a) it would take one more byte and b) it does not provide any information on the character offset. UTF-8/UTF-16 text strings have, in general, a variable character length ranging from 1 to 6 bytes. Therefore, the TOTAL/THIS solution is preferred. It could also be argued that the LEN and SLEN fields be used for this purpose, but while they would provide information about the completeness of the text sample, they do not specify the order of the fragments.
这里不使用通常的“字节偏移量”字段,原因有两个:a)它需要多用一个字节,b)它不提供任何有关字符偏移量的信息。UTF-8/UTF-16文本字符串通常具有1到6字节的可变字符长度。因此,首选总/本解决方案。也可以认为LEN和SLEN字段可用于此目的,但尽管它们可以提供有关文本样本完整性的信息,但它们没有指定片段的顺序。
In all cases (TYPEs 2, 3 and 4), if the value of THIS is greater than TOTAL or if TOTAL equals zero (0x0), the fragment SHALL be discarded.
在所有情况下(类型2、3和4),如果该值大于总和或总和等于零(0x0),则应丢弃碎片。
o Finally, the sample contents following the SLEN field consist of a fragment of the UTF-8/UTF-16 character string; no modifiers follow.
o 最后,SLEN字段后面的示例内容由UTF-8/UTF-16字符串片段组成;没有修改器跟随。
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE | LEN( always >6) |TOTAL | THIS | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE | LEN( always >6) |TOTAL | THIS | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 6. TYPE 3 Header Format
图6。类型3标题格式
This header type is used to transport either the entire modifier contents present in a text sample or just the first fragment of them. This depends on whether the modifier boxes fit in the current RTP payload.
此标题类型用于传输文本示例中存在的整个修饰符内容,或仅传输其中的第一个片段。这取决于修改器框是否适合当前RTP负载。
If a text sample containing modifiers is fragmented, this header MUST be used to transport the first fragment or, if possible, the complete modifiers.
如果包含修饰符的文本示例是分段的,则必须使用此标题来传输第一个片段,或者如果可能,传输完整的修饰符。
In detail:
详细内容:
o The U, R, and TYPE fields are defined as in Section 4.1.1.
o U、R和类型字段的定义见第4.1.1节。
o LEN indicates the length of the modifier contents. Its value is obtained upon fragmentation. Additionally, the LEN field MUST be greater than six (0x0006). Otherwise, the unit MUST be discarded.
o LEN表示修改器内容的长度。破碎后获得其价值。此外,LEN字段必须大于6(0x0006)。否则,必须丢弃该装置。
o The TOTAL/THIS field has the same meaning as for TYPE 2.
o 总计/此字段与类型2的含义相同。
For TYPE 3 units containing the last (trailing) modifier fragment, the value of TOTAL MUST be equal to that of THIS (TOTAL=THIS). In addition, TOTAL=THIS MUST be greater than one, because the total number of fragments of a text sample is logically always larger than one.
对于包含最后一个(尾部)修饰符片段的类型3单位,TOTAL的值必须等于THIS的值(TOTAL=THIS)。此外,TOTAL=这必须大于1,因为文本样本的片段总数在逻辑上总是大于1。
Otherwise, if TOTAL is different from THIS in a TYPE 3 unit, this means that the unit contains the first fragment of the modifiers.
否则,如果类型3单元中的TOTAL与此不同,则表示该单元包含修饰符的第一个片段。
o The SDUR has the same definition for TYPE 1. Since the fragments are always transported in own RTP packets, this field is only needed to know how long this fragment is valid. This may, e.g., be used to determine how long it should be kept in the display buffer.
o SDUR对类型1具有相同的定义。由于片段总是在自己的RTP数据包中传输,因此只需要此字段就可以知道该片段的有效时间。例如,这可用于确定其应在显示缓冲区中保持多长时间。
Note that the SLEN and SIDX fields are not present in TYPE 3 unit headers. This is because a) these fragments do not contain text strings and b) these types of fragments are applied over text string fragments, which already contain this information.
请注意,SLEN和SIDX字段不存在于类型3装置标题中。这是因为a)这些片段不包含文本字符串,b)这些类型的片段应用于已经包含此信息的文本字符串片段。
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE | LEN( always >6) |TOTAL | THIS | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE | LEN( always >6) |TOTAL | THIS | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 7. TYPE 4 Header Format
图7。类型4标题格式
This header type is placed before modifier fragments, other than the first one.
此标题类型放置在修改器片段之前,而不是第一个。
The U, R, and TYPE fields are used as per Section 4.1.1.
根据第4.1.1节使用U、R和类型字段。
LEN indicates as for TYPE 3 the length of the modifier contents and SHALL also be obtained upon fragmentation. The LEN field MUST be greater than six (0x0006). Otherwise, the unit MUST be discarded.
LEN表示类型3中改性剂含量的长度,也应在破碎时获得。LEN字段必须大于六(0x0006)。否则,必须丢弃该装置。
TOTAL/THIS is used as in TYPE 2.
总计/这与类型2中的使用相同。
The SDUR field is defined as in TYPE 1. The reasoning behind the absence of SLEN and SIDX is the same as in TYPE 3 units.
SDUR字段定义为类型1。SLEN和SIDX缺失的原因与3型机组相同。
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE | LEN( always >3) | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE | LEN( always >3) | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 8. TYPE 5 Header Format
图8。类型5标题格式
This header type is used to transport (dynamic) sample descriptions. Every sample description MUST have its own TYPE 5 header.
此标题类型用于传输(动态)样本描述。每个示例说明都必须有自己的类型5标题。
The U, R, and TYPE fields are used as per Section 4.1.1.
根据第4.1.1节使用U、R和类型字段。
The LEN field indicates the length of the sample description, plus three units accounting for the SIDX and LEN field itself. Thus, this field MUST be greater than three (0x0003). Otherwise, the unit MUST be discarded.
LEN字段表示示例描述的长度,加上三个单位,分别表示SIDX和LEN字段本身。因此,该字段必须大于三(0x0003)。否则,必须丢弃该装置。
If the sample is streamed from a 3GP file, the length of the sample description contents (i.e., what comes after SIDX in the unit itself) is obtained from the file (see Section 4.3).
如果样本是从3GP文件流式传输的,则样本描述内容的长度(即装置本身中SIDX之后的内容)是从该文件中获得的(参见第4.3节)。
The SIDX field contains a dynamic SIDX value assigned to the sample description carried as sample content of this unit. As only dynamic sample descriptions are carried using TYPE 5, the possible SIDX values are in the (closed) interval [0,127].
SIDX字段包含一个动态SIDX值,该值分配给作为本单元样本内容携带的样本描述。由于仅使用类型5进行动态样本描述,因此可能的SIDX值在(闭合)区间[0127]内。
Senders MAY make use of TYPE 5 units. All receivers MUST implement support for TYPE 5 units, since it adds minimum complexity and may increase the robustness of the streaming session.
发送方可使用5类装置。所有接收器必须实现对类型5单元的支持,因为它增加了最小的复杂性,并可能增加流会话的健壮性。
The next section specifies how SIDX values are calculated.
下一节指定如何计算SIDX值。
The buffering of sample descriptions is a matter of the client's timed text codec implementation. In order to work properly, this payload format requires that:
示例描述的缓冲是客户端的定时文本编解码器实现的问题。为了正常工作,此有效负载格式要求:
o Static sample descriptions MUST be buffered at the client, at least, for the duration of the session.
o 静态示例描述必须至少在会话期间在客户端进行缓冲。
o If dynamic sample descriptions are used, their buffering and update of the SIDX values MUST follow the mechanism described in the next section.
o 如果使用动态样本描述,它们的缓冲和SIDX值的更新必须遵循下一节中描述的机制。
The use of dynamic sample descriptions by senders is OPTIONAL. However, if they are used, senders MUST implement this mechanism. Receivers MUST always implement it.
发件人使用动态样本描述是可选的。但是,如果使用它们,发送方必须实现此机制。接收者必须始终执行它。
Dynamic SIDX values remain active either during the entire duration of the session (if used just once) or in different intervals of it (if used once or more).
动态SIDX值在整个会话期间(如果仅使用一次)或在会话的不同时间间隔(如果使用一次或多次)保持活动状态。
Note: In the following, SIDX means dynamic SIDX.
注:在下文中,SIDX表示动态SIDX。
For choosing the wraparound mechanism, the following rationale was used: There are 128 dynamic SIDX values possible, [0..127]. If one chooses to allow a maximum of 127 to be used as dynamic SIDXs, then any reordered packet with a new sample description would make the mechanism fail. For example, if the last packet received is SIDX=5, then all 127 values except SIDX=6 would be "active". Now, if a reordered packet arrives with a new description, SIDX=9, it will be mistakenly discarded, because the SIDX=9 is, at that moment, marked as "active" and active sample descriptions shall not be re-written. Therefore, a "guard interval" is introduced. This guard interval reduces the number of active SIDXs at any point in time to 64. Although most timed text applications will probably need less than 64 sample descriptions during a session (in total), a wraparound mechanism to handle the need for more is described here.
为了选择环绕机制,使用了以下基本原理:可能有128个动态SIDX值[0..127]。如果选择允许将最多127个数据包用作动态SIDx,那么任何带有新样本描述的重新排序数据包都将使该机制失败。例如,如果接收到的最后一个数据包是SIDX=5,那么除了SIDX=6之外的所有127个值都将是“活动的”。现在,如果重新排序的数据包带有一个新的描述SIDX=9,它将被错误地丢弃,因为此时SIDX=9被标记为“活动”,活动样本描述将不会被重新写入。因此,引入了“保护间隔”。此保护间隔将任何时间点的活动SIDx数量减少到64。虽然大多数定时文本应用程序在一个会话期间(总共)可能需要不到64个示例描述,但这里描述了一种处理更多示例描述的概括机制。
Thereby, a sliding window of 64 active SIDX values is used. Values within the window are "active"; all others are marked "inactive". An SIDX value becomes active if at least one sample description identified by that SIDX has been received. Since sample descriptions MAY be sent redundantly, it is possible that a client receives a given SIDX several times. However, active sample descriptions SHALL NOT be overwritten: The receiver SHALL ignore redundant sample descriptions and it MUST use the already cached copy. The "guard interval" of (64) inactive values ensures that the correct association SIDX <-> sample description is always used.
因此,使用64个活动SIDX值的滑动窗口。窗口内的值为“活动”;所有其他标记为“不活动”。如果收到至少一个由SIDX标识的样本描述,SIDX值将变为活动状态。由于示例描述可能会被冗余发送,因此客户机可能会多次收到给定的SIDX。但是,不应覆盖活动样本描述:接收器应忽略冗余样本描述,并且必须使用已缓存的副本。(64)个非活动值的“保护间隔”确保始终使用正确的关联SIDX<->示例描述。
Informative note: As for the "guard interval" value itself, 64 as 128/2 was considered simple enough while still meeting the expected maximum number of sample descriptions. Besides that, there's no other motivation for choosing 64 or a different value.
资料性说明:至于“保护间隔”值本身,64 As 128/2被认为足够简单,同时仍然满足预期的最大样本描述数。除此之外,选择64或其他值没有其他动机。
The following algorithm is used to buffer dynamic sample descriptions and to maintain the dynamic SIDX values:
以下算法用于缓冲动态样本描述并维护动态SIDX值:
Let X be the last SIDX received that updated the range of active sample descriptions. Let Y be a value within the allowed range for dynamic SIDX: [0,127], and different from X. Let Z be the SIDX of the last received sample description. Then:
假设X是最后一个接收到的更新活动样本描述范围的SIDX。设Y为动态SIDX:[0127]允许范围内的值,与X不同。设Z为上次收到的样本描述的SIDX。然后:
1. Initialize all dynamic SIDX values as inactive. For stored contents, read the sample description index in the Sample to Chunk box ("stsc") for that sample. For live streaming, the first value MAY be zero or any other value in the interval above. Go to step 2.
1. 将所有动态SIDX值初始化为非活动。对于存储的内容,请读取该样本的样本到区块框(“stsc”)中的样本描述索引。对于直播,第一个值可以是零,也可以是上述间隔中的任何其他值。转至步骤2。
2. First, in-band sample description with SIDX=Z is received and stored; set X=Z. Go to step 3.
2. 首先,接收并存储SIDX=Z的带内样本描述;设置X=Z。转至步骤3。
3. Any SIDX within the interval [X+1 modulo(128), X+64 modulo(128)] is marked as inactive, and any corresponding sample description is deleted. Any SIDX within the interval [X+65 modulo(128), X] is set active. Go to step 4 (wait state).
3. 区间[X+1模(128),X+64模(128)]内的任何SIDX都标记为非活动,并且删除任何相应的样本描述。区间[X+65模(128,X)]内的任何SIDX均设置为激活状态。转至步骤4(等待状态)。
4. Wait for next sample description. Once the client is initialized, the interval of active SIDX values MUST change whenever a sample description with an SIDX value in the inactive set is received. That is, upon reception of a sample description with SIDX=Z, do the following:
4. 等待下一个示例说明。一旦客户机初始化,只要收到非活动集合中具有SIDX值的样本描述,活动SIDX值的间隔就必须更改。也就是说,在收到SIDX=Z的样本描述后,执行以下操作:
a. If Z is in the (closed) interval [X+1 modulo(128), X+64 modulo(128)] then set X=Z, store the sample description, and go to step 3.
a. 如果Z在(闭合)间隔[X+1模(128),X+64模(128)]内,则设置X=Z,存储样本描述,然后转至步骤3。
b. Else, Z must be in the interval [X+65 modulo(128), X], thus:
b. 否则,Z必须在区间[X+65模(128),X]内,因此:
i. If SIDX=Z is not stored, then store the sample description. Go to beginning of step 4 (wait state). ii. Else, go to the beginning of step 4 (wait state).
我如果未存储SIDX=Z,则存储样本描述。转到步骤4的开头(等待状态)。二、否则,转到步骤4的开头(等待状态)。
Informative note: It is allowed that any value of SIDX=X be sent in the interval [0,127]. For example, if [64..127] is the current active set and SIDX=0 is sent, a new sample description is defined (0) and an old one deleted (64); thus [65..127] and [0] are active. Similarly, one could now send SIDX=64, thus inverting the active and inactive sets.
资料性说明:允许在[0127]间隔内发送SIDX=X的任何值。例如,如果[64..127]是当前活动集,并且发送了SIDX=0,则定义了一个新的样本描述(0),删除了一个旧的样本描述(64);因此[65..127]和[0]处于活动状态。类似地,现在可以发送SIDX=64,从而反转活动集和非活动集。
Example: If X=4, any SIDX in the interval [5,68] is inactive. Active SIDX values are in the complementary interval [69,127] plus
示例:如果X=4,则区间[5,68]中的任何SIDX都处于非活动状态。有效SIDX值在互补区间[69127]加上
[0,4]. For example, if the client receives a SIDX=6, then the active interval is now different: [0,6] plus [71,127]. If the received SIDX is in the current active interval, no change SHALL be applied.
[0,4]. 例如,如果客户端接收到SIDX=6,则活动间隔现在不同:[0,6]加上[71127]。如果接收到的SIDX处于当前激活间隔内,则不应进行任何更改。
For the purpose of streaming timed text contents, some values in the boxes contained in a 3GP file are mapped to fields of this payload header. This section explains where to find those values.
为了流式传输定时文本内容,3GP文件中包含的框中的一些值映射到此有效负载头的字段。本节说明在何处查找这些值。
Additionally, for the duration and sample description indexes, extension mechanisms are provided. All senders MUST implement the extension mechanisms described herein.
此外,对于持续时间和样本描述索引,还提供了扩展机制。所有发送方必须实现本文所述的扩展机制。
If the file is streamed out of a 3GP file, the following guidelines SHALL be followed.
如果文件是从3GP文件中流出来的,则应遵循以下指南。
Note: All fields in the objects (boxes) of a 3GP file are found in network byte order.
注意:3GP文件的对象(框)中的所有字段均按网络字节顺序排列。
Information obtained from the Sample Table Box (stbl):
从样本表格框(stbl)获得的信息:
o Sample Descriptions and Sample Description length: The Sample Description box (stsd, inside the stbl) contains the sample descriptions. For timed text media, each element of stsd is a timed text sample entry (type "tx3g").
o 样本描述和样本描述长度:样本描述框(stsd,stbl内)包含样本描述。对于定时文本媒体,STD的每个元素都是一个定时文本样本条目(类型“tx3g”)。
The (unsigned) 32 bits of the "size" field in the stsd box represent the length (in bytes) of the sample description, as carried in TYPE 5 units. On the other hand, the LEN field of TYPE 5 units is restricted to 16 bits. Therefore, if the value of "size" is greater than (2^16-1-3)[bytes], then the sample description SHALL NOT be streamed with this payload format. There is no extension mechanism defined in this case, since fragmentation of sample descriptions is not defined (sample descriptions are typically up to some 200 bytes in size). Note: The three (3) accounts for the TYPE 5 header fields included in the LEN value.
stsd框中“大小”字段的(无符号)32位表示样本描述的长度(字节),以类型5为单位。另一方面,类型5单元的LEN字段限制为16位。因此,如果“大小”的值大于(2^16-1-3)[字节],则样本描述不应以该有效负载格式传输。在这种情况下没有定义扩展机制,因为没有定义样本描述的分段(样本描述的大小通常高达200字节)。注:这三(3)个字段用于LEN值中包含的类型5标题字段。
o SDUR from the Decoding Time to Sample Box (stts). The (unsigned) 32 bits of the "sample delta" field are used for calculating SDUR. However, since the SDUR field is only 3 bytes long, text samples with duration values larger than (2^24-1)/(timestamp clockrate)[seconds] cannot be streamed directly. The solution is simple: Copies of the corresponding text sample SHALL be sent. Thereby, the timestamp and duration values SHALL be adjusted so that a continuous display
o 从解码时间到样本盒(stts)的SDUR。“采样增量”字段的(无符号)32位用于计算SDUR。但是,由于SDUR字段只有3字节长,因此持续时间值大于(2^24-1)/(时间戳时钟速率)[秒]的文本样本不能直接流式传输。解决方案很简单:应发送相应文本样本的副本。因此,应调整时间戳和持续时间值,以便连续显示
is guaranteed as if just one sample would have been sent. That is, a sample with timestamp TS and duration SDUR can be sent as two samples having timestamps TS1 and TS2 and durations SDUR1 and SDUR2, such that TS1=TS, TS2=TS1+SDUR1, and SDUR=SDUR1+SDUR2.
保证只发送一个样品。也就是说,具有时间戳TS和持续时间SDUR的样本可以作为具有时间戳TS1和TS2以及持续时间SDUR1和SDUR2的两个样本发送,使得TS1=TS、TS2=TS1+SDUR1和SDUR=SDUR1+SDUR2。
o Text sample length from the Sample Size Box (stsz). The (unsigned) 32 bits of the "sample size" or "entry size" (one of them, depending on whether the sample size is fixed or variable) indicate the length (in bytes) of the 3GP text sample. For obtaining the length of the (actual) streamed text sample, the lengths of the text string byte count (2 bytes) and, in case of UTF-16 strings, the length the BOM (also 2 bytes) SHALL be deducted. This is illustrated in Figure 9.
o 样本大小框(stsz)中的文本样本长度。“样本大小”或“条目大小”的(无符号)32位(其中一位,取决于样本大小是固定的还是可变的)表示3GP文本样本的长度(字节)。为了获得(实际)流式文本样本的长度,应扣除文本字符串字节数(2字节)的长度,如果是UTF-16字符串,则应扣除BOM的长度(也是2字节)。如图9所示。
Text Sample according to 3GPP TS 26.245
符合3GPP TS 26.245的文本样本
TEXT SAMPLE (length=stsz) .--------------------------------------------------. / \ TEXT STRING (length=TBC) .------------------------------------. / \ TBC BOM MODIFIERS +---+---+----------------------------------+-----------+ || || TBC BOM -> TLEN field || +---+---+ U bit || \/
TEXT SAMPLE (length=stsz) .--------------------------------------------------. / \ TEXT STRING (length=TBC) .------------------------------------. / \ TBC BOM MODIFIERS +---+---+----------------------------------+-----------+ || || TBC BOM -> TLEN field || +---+---+ U bit || \/
Text Sample according to this Payload Format
根据此有效负载格式的文本示例
TEXT SAMPLE (length=SLEN w/o TBC,BOM) .--------------------------------------------. / \ TEXT STRING (length=TLEN) .--------------------------------. / \ TEXT STRING MODIFIERS +----------------------------------+-----------+
TEXT SAMPLE (length=SLEN w/o TBC,BOM) .--------------------------------------------. / \ TEXT STRING (length=TLEN) .--------------------------------. / \ TEXT STRING MODIFIERS +----------------------------------+-----------+
KEY: TBC = Text string Byte Count BOM = Byte Order Mark
关键字:TBC=文本字符串字节计数BOM=字节顺序标记
Figure 9. Text sample composition
图9。文本样本组成
Moreover, since the LEN field in TYPE 1 unit header is 16 bits long, larger text sample sizes than (2^16-1-8) [bytes] SHALL NOT be streamed. Also, in this case, no extension mechanism is defined. This is because this maximum is considered enough for the targeted streaming applications. (Note: The eight (8) accounts for the TYPE 1 header fields included in the LEN value).
此外,由于类型1单元标题中的LEN字段长度为16位,因此不应传输大于(2^16-1-8)[字节]的文本样本大小。此外,在这种情况下,没有定义扩展机制。这是因为对于目标流应用程序来说,这个最大值已经足够了。(注意:八(8)个字段用于LEN值中包含的类型1标题字段)。
o SIDX from the Sample to Chunk Box (stsc): The stsc Box is used to find samples and their corresponding sample descriptions. These are referenced by the "sample description index", a 32-bit (unsigned) integer. If possible, these indices may be directly mapped to the SIDX field. However, there are several cases where this may not be possible:
o 从样本到区块框(stsc)的SIDX:stsc框用于查找样本及其相应的样本描述。它们由“样本描述索引”引用,这是一个32位(无符号)整数。如果可能,这些索引可以直接映射到SIDX字段。但是,在某些情况下,这可能是不可能的:
a) The total number of indices used is greater than the number of indices available, i.e., if the static sample descriptions are more than 127 or the dynamic ones are more than 64.
a) 使用的索引总数大于可用的索引数,即,如果静态样本描述多于127或动态样本描述多于64。
b) The original SIDX value ranges do not fit in the allowed ranges for static (129-254) or dynamic (0-127) values.
b) 原始SIDX值范围不符合静态(129-254)或动态(0-127)值的允许范围。
Therefore, when assigning SIDX values to the sample descriptions, the following guidelines are provided:
因此,在将SIDX值指定给示例描述时,提供了以下准则:
o Static sample descriptions can simply be assigned consecutive values within the range 129-254 (closed interval). This range should be well enough for static sample descriptions.
o 静态样本描述可以简单地分配129-254(闭合间隔)范围内的连续值。该范围应足以用于静态样本描述。
o As for dynamic sample descriptions:
o 关于动态样本描述:
a) Streams that use less than 64 dynamic sample descriptions SHOULD use consecutive values for SIDX anywhere in the range 0-127 (closed interval).
a) 使用少于64个动态样本描述的流应使用0-127(闭合间隔)范围内任意位置的SIDX连续值。
b) For streams with more than 64 sample descriptions, the SIDX values MUST be assigned in usage order, and if any sample description shall be used after it has been set inactive, it will need to be re-sent and assigned a new SIDX value (according to the algorithm in Section 4.2.1).
b) 对于具有64个以上样本描述的流,必须按使用顺序分配SIDX值,如果任何样本描述在设置为非活动状态后应使用,则需要重新发送并分配新的SIDX值(根据第4.2.1节中的算法)。
Information obtained from the Media Data Box:
从媒体数据框获得的信息:
o Text strings, TLEN, U bit, and modifiers from the Media Data Box (mdat). Text strings, 16-bit text string byte count, Byte Order Mark (BOM, indicating UTF encoding), and modifier boxes can be found here.
o 媒体数据框(mdat)中的文本字符串、TLEN、U位和修饰符。可以在此处找到文本字符串、16位文本字符串字节计数、字节顺序标记(BOM,表示UTF编码)和修改器框。
For TYPE 1 units, the value of TLEN is extracted from the text string byte count that precedes the text string in the text sample, as stored in the 3GP file. If UTF-16 encoding is used, two (2) more bytes have to be deducted from this byte count beforehand, in order to exclude the BOM. See Figure 9.
对于类型1单元,从文本样本中文本字符串之前的文本字符串字节计数中提取TLEN值,如3GP文件中所存储。如果使用UTF-16编码,则必须事先从该字节计数中再减去两(2)个字节,以排除BOM。参见图9。
This section explains why text samples may have to be fragmented and discusses some of the possible approaches to doing it. A solution is proposed together with rules and recommendations for fragmenting and transporting text samples.
本节解释了为什么文本示例可能必须分段,并讨论了一些可能的方法。提出了一种解决方案,并给出了分割和传输文本样本的规则和建议。
3GPP Timed Text applications are expected to operate at low bitrates. This fact, added to the small size of timed text samples (typically one or two hundred bytes) makes fragmentation of text samples a rare event. Samples should usually fit into the MTU size of the used network path.
3GPP定时文本应用程序预计将以低比特率运行。这一事实,加上小尺寸的定时文本样本(通常为一个或两百字节),使得文本样本碎片成为一个罕见的事件。样本通常应适合所用网络路径的MTU大小。
Nevertheless, some text strings (e.g., ending roll in a movie) and some modifier boxes (i.e., for hyperlinks, for karaoke, or for styles) may become large. This may also apply for future modifier boxes. In such cases, the first option to consider is whether it is possible to adjust the encoding (e.g., the size of sample) in such a way that fragmentation is avoided. If it is, this is preferred to fragmentation and SHOULD be done.
然而,一些文本字符串(例如,电影中的结尾滚动)和一些修改框(例如,用于超链接、卡拉OK或样式)可能会变大。这也可能适用于将来的修改器框。在这种情况下,要考虑的第一个选项是是否可以以避免碎片的方式来调整编码(例如,样本的大小)。如果是,这比碎片化更好,应该这样做。
Otherwise, if this is not possible or other constraints prevent it, fragmentation MAY be used, and the basic guidelines given in this document MUST be followed:
否则,如果这是不可能的或其他限制条件阻止了它,则可以使用碎片,并且必须遵循本文件中给出的基本准则:
o It is RECOMMENDED that text samples be fragmented as seldom as possible, i.e., the least possible number of fragments is created out of a text sample.
o 建议尽可能少地分割文本样本,即从文本样本中创建尽可能少的片段。
o If there is some bitrate and free space in the payload available, sample descriptions (if at hand) SHOULD be aggregated.
o 如果有效负载中有一些比特率和可用空间,则应聚合示例描述(如果有)。
o Text strings MUST split at character boundaries; see TYPE 2 header. Otherwise, it is not possible to display the text contents of a fragment if a previous fragment was lost. As a consequence, text
o 文本字符串必须在字符边界处拆分;参见类型2标题。否则,如果先前的片段丢失,则无法显示片段的文本内容。因此,文本
string fragmentation requires knowledge of the UTF-8/UTF-16 encoding formats to determine character boundaries.
字符串碎片需要了解UTF-8/UTF-16编码格式来确定字符边界。
o Unlike text strings, the modifier boxes are NOT REQUIRED to be split at meaningful boundaries. However, it is RECOMMENDED that this be done whenever possible. This decreases the effects of packet loss. This payload format does not ensure that partially received modifiers are applied to text strings. If only part of the modifiers is received, it is an application issue how to deal with these, i.e., whether or not to use them.
o 与文本字符串不同,修改器框不需要在有意义的边界处拆分。但是,建议尽可能做到这一点。这减少了数据包丢失的影响。此有效负载格式不能确保将部分接收的修饰符应用于文本字符串。如果只收到部分修改器,则如何处理这些修改器,即是否使用它们,是应用程序的问题。
Informative note: Ensuring that partially received modifiers can be applied to text strings in all cases (for all modifier types and for all fragment loss constellations) would place additional requirements on the payload format. In particular, this would require that: a) senders understand the semantics of the modifier boxes and b) specific fragment headers for each of the modifier boxes are defined, in addition to the payload formats defined below. Understanding the modifiers semantics means knowing, e.g., where each modifier starts and ends, which text fragments are affected, which modifiers may or may not be split, or what the fields indicate. This is necessary to be able to split the modifiers in such a way that each fragment can be applied independently of previous packet losses. This would require a more intelligent fragmentation entity and more complex headers. Given the low probability of fragmentation and the desire to keep the requirements low, it does not seem reasonable to specify such modifier box specific headers.
资料性说明:确保在所有情况下(对于所有修饰符类型和所有碎片丢失星座),部分接收的修饰符都可以应用于文本字符串,这将对有效负载格式提出额外要求。特别是,这需要:a)发送者理解修改器框的语义,b)定义每个修改器框的特定片段头,以及下面定义的有效负载格式。理解修饰符语义意味着知道,例如,每个修饰符的开始和结束位置、哪些文本片段受到影响、哪些修饰符可以拆分或不拆分,或者字段指示什么。这对于能够以这样的方式分割修饰符是必要的,即每个片段可以独立于先前的数据包丢失而应用。这将需要更智能的分段实体和更复杂的头。考虑到碎片化的可能性很低,并且希望保持较低的需求,指定这种特定于修改器框的标题似乎是不合理的。
o Modifier and text string fragments SHOULD be protected against packet losses, i.e., using FEC [7], retransmission [11], repetition (Section 5), or an equivalent technique. This minimizes the effects of packet loss.
o 修改器和文本字符串片段应防止数据包丢失,即使用FEC[7]、重传[11]、重复(第5节)或等效技术。这将数据包丢失的影响降至最低。
o An additional requirement when fragmenting text samples is that the start of the modifiers MUST be indicated using the payload header defined for that purpose, i.e., a TYPE 3 unit MUST be used (see Section 4.1.4). This enables a receiver to detect the start of the modifiers as long as there are not two or more consecutive packet losses.
o 分割文本样本时的另一项要求是,必须使用为此目的定义的有效载荷标题指示修饰符的开始,即必须使用类型3装置(见第4.1.4节)。这使得接收机能够检测修改器的开始,只要没有两个或更多连续的分组丢失。
o Finally, sample descriptions SHALL NOT be fragmented because they contain important information that may affect several text samples.
o 最后,样本描述不应支离破碎,因为它们包含可能影响多个文本样本的重要信息。
The payload headers defined in this document allow reassembling fragmented text samples. For this purpose, the standard RTP timestamp, the duration field (SDUR), and the fields TOTAL/THIS in the payload headers are used.
本文档中定义的有效负载头允许重新组装碎片文本样本。为此,将使用标准RTP时间戳、持续时间字段(SDUR)和有效负载报头中的字段TOTAL/this。
Units that belong to the same text sample MUST have the same timestamp. TYPE 5 units do not comply with this rule since they are not part of any particular text sample.
属于同一文本样本的单元必须具有相同的时间戳。类型5单位不符合此规则,因为它们不是任何特定文本样本的一部分。
The process for collecting the different fragments (units) of a text sample is as follows:
收集文本样本的不同片段(单元)的过程如下:
1. Search for units having the same timestamp value, i.e., units that belong to the same text sample or sample descriptions that shall become available at that time instant. If several units of the same sample are repeated, only one of them SHALL be used. Repeated units are those that have the same timestamp and the same values for TOTAL/THIS.
1. 搜索具有相同时间戳值的单位,即属于同一文本样本或样本描述的单位,该文本样本或样本描述应在该时刻可用。如果重复使用同一样品的多个单元,则只能使用其中一个单元。重复单位是指具有相同时间戳和相同TOTAL/THIS值的单位。
Note that, as mentioned in Section 4.1.1, the receiver SHALL ignore units with unrecognized TYPE value. However, the RTP header fields and the rest of the units (if any) in the payload are still useful.
注意,如第4.1.1节所述,接收器应忽略类型值无法识别的装置。但是,RTP报头字段和有效负载中的其余单元(如果有)仍然有用。
2. Check within this set whether any of the units from the text sample is missing. This is done using the TOTAL and THIS fields; the TOTAL field indicates how many fragments were created out of the text sample, and the THIS field indicates the position of this fragment in the text sample. As result of this operation, two outcomes are possible:
2. 检查此集合中是否缺少文本样本中的任何单位。这是使用总计和此字段完成的;TOTAL字段表示从文本样本中创建了多少片段,THIS字段表示该片段在文本样本中的位置。由于这一操作,可能产生两种结果:
a. No fragment is missing. Then, the THIS field SHALL be used to order the fragments and reassemble the text sample before forwarding it to the decoding application. Special care SHALL be taken when reassembling the text string as indicated in bullet 4 below.
a. 没有缺失任何片段。然后,在将文本样本转发给解码应用程序之前,应使用此字段对片段进行排序并重新组装文本样本。如下面的项目符号4所示,重新组装文本字符串时应特别小心。
b. One or more fragments are missing: Check whether this fragment belongs to the text string or to the modifiers. TYPE 2 units identify text string fragments, and TYPE 3 and 4 identify modifier fragments:
b. 缺少一个或多个片段:检查此片段是属于文本字符串还是属于修饰符。类型2单元标识文本字符串片段,类型3和4标识修饰符片段:
i. If the fragment or fragments missing belong to the text string and the modifiers were received complete, then the received text characters may, at least, be displayed as plain text. Some modifiers may only be
i. 如果缺少的一个或多个片段属于文本字符串,并且修饰符已完整接收,则接收到的文本字符至少可以显示为纯文本。某些修改器可能仅适用于
applied as long as it is possible to identify the character numbers, e.g., if only the last text string fragment is lost. This is the case for modifiers defining specific font styles ('styl'), highlighted characters ('hlit'), karaoke feature ('krok'), and blinking characters ('blnk'). Other modifiers such as 'dlay' or 'tbox' can be applied without the knowledge of the character number. It is an application issue to decide whether or not to apply the modifiers.
只要有可能识别字符编号,即适用,例如,如果仅丢失最后一个文本字符串片段。对于定义特定字体样式(“styl”)、突出显示字符(“hlit”)、卡拉OK功能(“krok”)和闪烁字符(“blnk”)的修饰符,就是这种情况。其他修饰符,如“dlay”或“tbox”,可以在不知道字符编号的情况下应用。决定是否应用修改器是一个应用问题。
ii. If the fragment missing belongs to the modifiers and the text strings were received complete, then the incomplete modifiers may be used. The text string SHOULD at least be displayed as plain text. As mentioned in Section 4.4, modifiers may split without observing meaningful boundaries. Hence, it may not always be possible to make use of partially received modifiers. However, to avoid this, it is RECOMMENDED that the modifiers do split at meaningful boundaries.
二、如果缺少的片段属于修饰符,并且文本字符串接收完整,则可以使用不完整的修饰符。文本字符串至少应显示为纯文本。如第4.4节所述,修饰符可能在没有观察到有意义边界的情况下分裂。因此,可能并不总是能够使用部分接收的修改器。但是,为了避免这种情况,建议修改器在有意义的边界处拆分。
iii. A third possibility is that it is not possible to discern whether modifiers or text strings were received complete. For example, if the TYPE 3 unit of a sample plus the following or preceding packet is lost, there is no way for the RTP receiver to know if one or both packets lost belong to the modifiers or if there are also some missing text strings. Repetition, FEC, retransmission, or other protection mechanisms as per section 4.6 are RECOMMENDED to avoid this situation.
第三种可能性是,无法辨别是否收到完整的修饰符或文本字符串。例如,如果样本的类型3单元加上以下或之前的数据包丢失,RTP接收器无法知道丢失的一个或两个数据包是否属于修饰符,或者是否还有一些丢失的文本字符串。建议采用第4.6节规定的重复、FEC、重传或其他保护机制来避免这种情况。
iv. Finally, if it is sure that neither text strings nor modifiers were received complete, then the text strings and the modifiers may be rendered partially or may be discarded. This is an application choice.
最后,如果确定文本字符串和修饰符均未完整接收,则文本字符串和修饰符可部分呈现或丢弃。这是一个应用程序选择。
3. Sample descriptions can be directly associated with the reassembled text samples, via the sample description index (SIDX).
3. 样本描述可以通过样本描述索引(SIDX)与重新组合的文本样本直接关联。
4. Reassembling of text strings: Since the text strings transported in RTP packets MUST NOT include any byte order mark (BOM), the receiver MUST prepend it to the reassembled UTF-16 string before handling it to the timed text decoder (see Figure 9). The value of the BOM is 0xFEFF because only big endian serialization of UTF-16 strings is supported by this payload format.
4. 文本字符串的重新组合:由于RTP数据包中传输的文本字符串不得包含任何字节顺序标记(BOM),接收器必须在将其处理到定时文本解码器之前将其预编到重新组合的UTF-16字符串中(见图9)。BOM的值为0xFEFF,因为此有效负载格式只支持UTF-16字符串的大端序列化。
Units SHOULD be aggregated to avoid overhead, whenever possible. The aggregate payloads MUST comply with one of the following ordered configurations:
只要有可能,应聚合单元以避免开销。总有效载荷必须符合以下一种有序配置:
1. Zero or more sample descriptions (TYPE 5) followed by zero or more whole text samples (TYPE 1 units). At least one unit of either type MUST be present.
1. 零个或多个样本描述(类型5),后跟零个或多个全文样本(类型1单元)。任何一种类型的装置必须至少有一个。
2. Zero or more sample descriptions followed by zero or one modifier fragment, either TYPE 3 or TYPE 4. At least one unit MUST be present.
2. 零个或多个示例描述,后跟零个或一个修饰符片段,类型3或类型4。必须至少有一个单元存在。
3. Zero or more sample descriptions, followed by zero or one text string fragment (TYPE 2), followed by zero or one TYPE 3 unit. If a TYPE 2 unit and a TYPE 3 unit are present, then they MUST belong to the same text sample. At least one unit MUST be present.
3. 零个或多个示例描述,后跟零个或一个文本字符串片段(类型2),后跟零个或一个类型3单元。如果存在类型2单元和类型3单元,则它们必须属于同一文本样本。必须至少有一个单元存在。
Some observations:
一些意见:
o Different aggregates than the ones listed above SHALL NOT be used.
o 不得使用与上述骨料不同的骨料。
o Sample descriptions MUST be placed in the aggregate payload before the occurrence of any non-TYPE 5 units.
o 在出现任何非5型装置之前,必须将样本说明置于聚合有效载荷中。
o Correct reception of TYPE 5 units is important since their contents may be referenced by several other units in the stream.
o 正确接收类型5单元非常重要,因为它们的内容可能会被流中的其他几个单元引用。
Receivers are unable to use text samples until their corresponding sample descriptions are received. Accordingly, a sender SHOULD send multiple copies of a sample description to ensure reliability (see Section 5). Receivers MAY use payload-specific feedback messages [21] to tell a sender that they have received a particular sample description.
在收到相应的样本描述之前,接收者无法使用文本样本。因此,发送方应发送多份样本描述副本,以确保可靠性(见第5节)。接收者可以使用特定于有效载荷的反馈消息[21]来告诉发送者他们已经收到特定的样本描述。
o Regarding timestamp calculation: In general, the rules for calculating the timestamp of units in an aggregate payload depend on the type of unit. Based on the possible constellations for aggregate payloads, as above, we have:
o 关于时间戳计算:一般来说,计算聚合有效负载中单元的时间戳的规则取决于单元的类型。根据上述聚合有效载荷的可能星座,我们有:
o Sample descriptions MUST receive the RTP timestamp of the packet in which they are included.
o 样本描述必须接收包含它们的数据包的RTP时间戳。
Note that for TYPE 5 units, the timestamp actually does not represent the instant when they are played out, but instead the instant at which they become available for use.
请注意,对于类型5单元,时间戳实际上并不表示播放它们的时刻,而是表示它们可供使用的时刻。
o For the first configuration: The first TYPE 1 unit receives the RTP timestamp. The timestamp of any subsequent TYPE 1 unit MUST be obtained by adding sample duration and timestamp, both of the preceding TYPE 1 unit.
o 对于第一个配置:第一个类型1单元接收RTP时间戳。任何后续类型1单元的时间戳必须通过添加样本持续时间和时间戳来获得,这两个样本持续时间和时间戳都是之前的类型1单元。
o For the second and third configuration, all units, TYPE 2, 3, and 4, MUST receive the RTP timestamp.
o 对于第二和第三种配置,类型2、3和4的所有单元都必须接收RTP时间戳。
Refer to detailed examples on the timestamp calculation below.
请参阅下面有关时间戳计算的详细示例。
o As per configuration 3 above, a payload MAY contain several fragments of one (and only one) text sample. If it does, then exactly one TYPE 2 unit followed by exactly one TYPE 3 unit is allowed in the same payload. This is in line with RFC 3640 [12], Section 2.4, which explicitly disallows combining fragments of different samples in the same RTP payload. Note that, in this special case, no timestamp calculation is needed. That is, the RTP timestamp of both units is equal to the timestamp in the packet's RTP header.
o 根据上面的配置3,有效负载可能包含一个(且仅一个)文本样本的多个片段。如果是这样,则在同一有效载荷中允许正好有一个2型装置后跟正好一个3型装置。这与RFC 3640[12]第2.4节一致,该节明确禁止在同一RTP有效载荷中组合不同样本的片段。注意,在这种特殊情况下,不需要计算时间戳。也就是说,两个单元的RTP时间戳等于包的RTP报头中的时间戳。
o Finally, note that the use of empty text samples allows for aggregating non-consecutive TYPE 1 units in the same payload. Two text samples, with timestamps TS1 and TS3 and durations SDUR1 and SDUR3, are not consecutive if it holds TS1+SDUR1 < TS3. A solution for this is to include an empty TYPE 1 unit with duration SDUR2 between them, such that TS2+SDUR2 = TS1+SDUR1+SDUR2 = TS3.
o 最后,请注意,使用空文本样本允许在同一负载中聚合非连续的1型单元。具有时间戳TS1和TS3以及持续时间SDUR1和SDUR3的两个文本样本如果保持TS1+SDUR1<TS3,则它们不是连续的。解决方案是包括一个空的1型机组,其持续时间为SDUR2,这样TS2+SDUR2=TS1+SDUR1+SDUR2=TS3。
Some examples of aggregate payloads are illustrated in Figure 10. (Note: The figure is not scaled.)
图10显示了一些聚合有效载荷的示例。(注:该图未按比例缩放。)
N/A TS1 TS2 TS3 +------+-----+------+-----+ |TYPE5 |TYPE1|TYPE1 |TYPE1| +------+-----+------+-----+ N/A sdur1 sdur2 sdur3
N/A TS1 TS2 TS3 +------+-----+------+-----+ |TYPE5 |TYPE1|TYPE1 |TYPE1| +------+-----+------+-----+ N/A sdur1 sdur2 sdur3
N/A TS4 +-----+-------+ |TYPE5| TYPE 1| a) +-----+-------+ N/A sdur4
N/A TS4 +-----+-------+ |TYPE5| TYPE 1| a) +-----+-------+ N/A sdur4
TS4 TS4 TS4 +--------------+ +--------------+ | TYPE2 | |TYPE2 |TYPE 3 | b) +--------------+ +--------------+ sdur4 sdur4 sdur4
TS4 TS4 TS4 +--------------+ +--------------+ | TYPE2 | |TYPE2 |TYPE 3 | b) +--------------+ +--------------+ sdur4 sdur4 sdur4
TS4 TS4 +--------------+ +--------------+ | TYPE2| TYPE 3| | TYPE4 | c) +--------------+ +--------------+ sdur4 sdur4 sdur4
TS4 TS4 +--------------+ +--------------+ | TYPE2| TYPE 3| | TYPE4 | c) +--------------+ +--------------+ sdur4 sdur4 sdur4
|----------PAYLOAD 1------| |--PAYLOAD 2---| |--PAYLOAD 3---| rtpts1 rtpts2 rtpts3
|----------PAYLOAD 1------| |--PAYLOAD 2---| |--PAYLOAD 3---| rtpts1 rtpts2 rtpts3
KEY: TSx = Text Sample x rtptsy = the standard RTP timestamp for PAYLOAD y sdurx = the duration of Text Sample x N/A = not applicable
KEY:TSx=文本样本x rtptsy=有效负载的标准RTP时间戳y sdurx=文本样本的持续时间x N/A=不适用
Figure 10. Example aggregate payloads
图10。聚合有效载荷示例
In Figure 10, four text samples (TS1 through TS4) are sent using three RTP packets. These configurations have been chosen to show how the 5 TYPE headers are used. Additionally, three different possibilities for the last text sample, TS4, are depicted: a), b), and c).
在图10中,使用三个RTP数据包发送四个文本样本(TS1到TS4)。选择这些配置是为了说明如何使用5种类型的标头。此外,最后一个文本示例TS4描述了三种不同的可能性:a)、b)和c)。
In Figure 11, option b) from Figure 10 is chosen to illustrate how the timestamp for each unit is found.
在图11中,选择图10中的选项b)来说明如何找到每个单元的时间戳。
N/A TS1 TS2 TS3 TS4 TS4 TS4 +------+-----+------+-----+ +--------------+ +--------------+ |TYPE5 |TYPE1|TYPE1 |TYPE1| | TYPE2 | |TYPE2 |TYPE 3 | +------+-----+------+-----+ +--------------+ +--------------+ N/A sdur1 sdur2 sdur3 sdur4 sdur4 sdur4
N/A TS1 TS2 TS3 TS4 TS4 TS4 +------+-----+------+-----+ +--------------+ +--------------+ |TYPE5 |TYPE1|TYPE1 |TYPE1| | TYPE2 | |TYPE2 |TYPE 3 | +------+-----+------+-----+ +--------------+ +--------------+ N/A sdur1 sdur2 sdur3 sdur4 sdur4 sdur4
(#1) (#2) (#3) (#4) (#5) (#6) (#7)
(#1) (#2) (#3) (#4) (#5) (#6) (#7)
|----------PAYLOAD 1------| |--PAYLOAD 2---| |--PAYLOAD 3---| rtpts1 rtpts2 rtpts3
|----------PAYLOAD 1------| |--PAYLOAD 2---| |--PAYLOAD 3---| rtpts1 rtpts2 rtpts3
Figure 11. Selected payloads from Figure 10
图11。从图10中选择的有效载荷
Assuming TSx means Text Sample x, rtptsy represents the standard RTP timestamp for PAYLOAD y and sdurx, the duration of Text Sample x, the timestamp for unit #z, ts(#z), can be found as the sum of rtptsy and the cumulative sum of the durations of preceding units in that payload (except in the case of PAYLOAD 3 as per rule 3 above). Thus, we have:
假设TSx表示文本样本x,rtptsy表示有效载荷y和sdurx的标准RTP时间戳,则文本样本x的持续时间,即单位#z的时间戳ts(#z),可作为rtptsy和该有效载荷中先前单位持续时间的累计和的总和找到(根据上述规则3,有效载荷3的情况除外)。因此,我们:
1. for the units in the first aggregate payload, PAYLOAD 1:
1. 对于第一个聚合有效载荷中的装置,有效载荷1:
ts(#1) = rtpts1 ts(#2) = rtpts1 ts(#3) = rtpts1 + sdur1 ts(#4) = rtpts1 + sdur1 + sdur2
ts(#1) = rtpts1 ts(#2) = rtpts1 ts(#3) = rtpts1 + sdur1 ts(#4) = rtpts1 + sdur1 + sdur2
Note that the TYPE 5 and the first TYPE 1 unit have both the RTP timestamp.
注意,类型5和第一个类型1单元都具有RTP时间戳。
2. for PAYLOAD 2:
2. 对于有效载荷2:
ts(#5) = rtpts2
ts(#5) = rtpts2
3. for PAYLOAD 3:
3. 对于有效载荷3:
ts(#6) = ts(#7) = rtpsts2 = rtpts3
ts(#6) = ts(#7) = rtpsts2 = rtpts3
According to configuration 3 above, the TYPE2 and the TYPE 3 units shall belong to the same sample. Hence, rtpts3 must be equal to rtpts2. For the same reason, the value of SDUR is not be used to calculate the timestamp of the next unit.
根据上述配置3,类型2和类型3装置应属于同一个样品。因此,rtpts3必须等于rtpts2。出于同样的原因,SDUR的值不能用于计算下一个单元的时间戳。
Some examples of payloads using the defined headers are shown below:
使用定义的标题的一些有效载荷示例如下所示:
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE1| LEN (always >=8) | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | TLEN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TLEN | | +---------------+ | | text string (no.bytes=TLEN) | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | modifiers (no.bytes=LEN - 8 - TLEN) | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE1| LEN (always >=8) | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | TLEN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TLEN | | +---------------+ | | text string (no.bytes=TLEN) | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | modifiers (no.bytes=LEN - 8 - TLEN) | | +-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE1| LEN (always >=8) | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | TLEN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TLEN | | +---------------+ | | text string (no.bytes=TLEN) | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | modifiers (no.bytes=LEN - 8 - TLEN) | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE1| LEN (always >=8) | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | TLEN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TLEN | | +---------------+ | | text string (no.bytes=TLEN) | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | modifiers (no.bytes=LEN - 8 - TLEN) | | +-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 12. A payload carrying two TYPE 1 units
图12。装载两个1型装置的有效载荷
In Figure 12, an RTP packet carrying two TYPE 1 units is depicted. It can be seen how the length fields LEN and TLEN can be used to find the start of the next unit (LEN), the start of the modifiers (TLEN), and the length of the modifiers (LEN-TLEN).
在图12中,描述了承载两个类型1单元的RTP数据包。可以看到如何使用长度字段LEN和TLEN来查找下一个单位的开始(LEN)、修改器的开始(TLEN)和修改器的长度(LEN-TLEN)。
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE5| LEN( always >3) | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | sample description (no.bytes=LEN - 3) | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE1| LEN (always >=8) | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | TLEN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TLEN | | +-+-+-+-+-+-+-+-+ | | text string fragment (no.bytes=TLEN) | | | | | | +-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE5| LEN( always >3) | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | sample description (no.bytes=LEN - 3) | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE1| LEN (always >=8) | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | TLEN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TLEN | | +-+-+-+-+-+-+-+-+ | | text string fragment (no.bytes=TLEN) | | | | | | +-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 13. An RTP packet carrying a TYPE 5 and a TYPE 1 unit
图13。携带类型5和类型1单元的RTP数据包
In Figure 13, a sample description and a TYPE 1 unit are aggregated. The TYPE 1 unit happens to contain only text strings and is small, so an additional TYPE 5 unit is included to take advantage of the available bits in the packet.
在图13中,聚合了一个示例描述和一个类型1单元。类型1单元恰好只包含文本字符串,而且很小,因此还包括一个额外的类型5单元,以利用数据包中的可用位。
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE2| LEN( always >9) |TOTAL=4|THIS=1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SLEN | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | text string fragment (no.bytes=LEN - 9) | | | : : : : | +-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE2| LEN( always >9) |TOTAL=4|THIS=1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SLEN | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | text string fragment (no.bytes=LEN - 9) | | | : : : : | +-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 14. Payload with first text string fragment of a sample
图14。带有示例的第一个文本字符串片段的有效负载
In Figures 14, 15, and 16, a text sample is split into three RTP packets. In Figure 14, the text string is big and takes the whole packet length. In Figure 15, the only possibility for carrying two fragments of the same text sample is represented (see configuration 3 in Section 4.6). The last packet, shown in Figure 16, carries the last modifier fragment, a TYPE 4.
在图14、15和16中,一个文本样本被分成三个RTP数据包。在图14中,文本字符串很大,占据了整个数据包的长度。在图15中,表示了携带相同文本样本的两个片段的唯一可能性(参见第4.6节中的配置3)。最后一个数据包,如图16所示,携带最后一个修饰符片段,类型4。
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE2| LEN( always >9) |TOTAL=4|THIS=2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SLEN | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | text string fragment (no.bytes=LEN - 9) | | | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE3| LEN( always >6) |TOTAL=4|THIS=3 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | modifiers (no.bytes=LEN - 6) | | +-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE2| LEN( always >9) |TOTAL=4|THIS=2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SLEN | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | text string fragment (no.bytes=LEN - 9) | | | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE3| LEN( always >6) |TOTAL=4|THIS=3 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | modifiers (no.bytes=LEN - 6) | | +-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 15. An RTP packet carrying a TYPE 2 unit and a TYPE 3 unit
图15。携带类型2单元和类型3单元的RTP数据包
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE4| LEN( always >6) |TOTAL=4|THIS=4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | modifiers (no.bytes=LEN - 6) | | +-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE4| LEN( always >6) |TOTAL=4|THIS=4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | modifiers (no.bytes=LEN - 6) | | +-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 16. An RTP packet carrying last modifiers fragment (TYPE 4)
图16。携带最后修饰符片段的RTP数据包(类型4)
RFC 3640 [12] defines a payload format for the transport of any non-multiplexed MPEG-4 elementary stream. One of the various MPEG-4 elementary stream types is MPEG-4 timed text streams, specified in MPEG-4 part 17 [26], also known as ISO/IEC 14496-17. MPEG-4 timed text streams are capable of carrying 3GPP timed text data, as specified in 3GPP TS 26.245 [1].
RFC 3640[12]定义了用于传输任何非多路复用MPEG-4基本流的有效载荷格式。各种MPEG-4基本流类型之一是MPEG-4定时文本流,在MPEG-4第17部分[26]中规定,也称为ISO/IEC 14496-17。如3GPP TS 26.245[1]所述,MPEG-4定时文本流能够承载3GPP定时文本数据。
MPEG-4 timed text streams are intentionally constructed so as to guarantee interoperability between RFC 3640 and this payload format. This means that the construction of the RTP packets carrying timed text is the same. That is, the MPEG-4 timed text elementary stream as per ISO/IEC 14496-17 is identical to the (aggregate) payloads constructed using this payload format.
有意构造MPEG-4定时文本流,以保证RFC 3640和该有效负载格式之间的互操作性。这意味着承载定时文本的RTP包的构造是相同的。也就是说,根据ISO/IEC 14496-17的MPEG-4定时文本基本流与使用该有效载荷格式构造的(聚合)有效载荷相同。
Figure 17 illustrates the process of constructing an RTP packet containing timed text. As can be seen in the partition block, the (transport) units used in this payload format are identical to the Timed Text Units (TTUs) defined in ISO/IEC 14496-17. Likewise, the rules for payload aggregation as per Section 4.6 are identical to those defined in ISO/IEC 14496-17 and are compliant with RFC 3640. As a result, an RTP packet that uses this payload format is identical to an RTP packet using RFC 3640 conveying TTUs according to ISO/IEC 14496-17. In particular, MPEG-4 Part 17 specifies that when using
图17说明了构建包含定时文本的RTP数据包的过程。从分区块中可以看出,此有效负载格式中使用的(传输)单元与ISO/IEC 14496-17中定义的定时文本单元(TTU)相同。同样,第4.6节规定的有效载荷聚合规则与ISO/IEC 14496-17中定义的规则相同,并符合RFC 3640。因此,根据ISO/IEC 14496-17,使用该有效载荷格式的RTP分组与使用RFC 3640传输TTU的RTP分组相同。具体而言,MPEG-4第17部分规定了
RFC 3640 for transporting timed text streams, the "streamType" parameter value is set to 0x0D, and the value of the "objectTypeIndication" in "config" takes the value 0x08.
RFC 3640对于传输定时文本流,“streamType”参数值设置为0x0D,“config”中“objectTypeIndication”的值取0x08。
+--------------------------------------+ Text samples | +--------------+ +--------------+ | as per 3GPP | |Text Sample 1 | |Text Sample N | | TS 26245 | +--------------+ +--------------+ | +--------------------------------------+ \/ +-------------------------------------------------------------------+ | Partition Text Samples into units. TTU[i]= TYPE i units. | | | |[U R TYPE LEN][{TOTAL,THIS}SIDX{SDUR}{TLEN}{SLEN}][SampleContents] | |{..} means present if applicable, [..] means always present | +-------------------------------------------------------------------+ \/ \/ +-------------------------------------------------------------------+ | Aggregation (if possible) | +-------------------------------------------------------------------+ \/ \/ +-------------------------------------------------------------------+ | RTP Entity adds and fills RTP header and Sends RTP packet, where | | RTP packets according to this Payload Format = | | RTP packets carrying MPEG-4 Timed Text ES over RFC 3640 | +-------------------------------------------------------------------+
+--------------------------------------+ Text samples | +--------------+ +--------------+ | as per 3GPP | |Text Sample 1 | |Text Sample N | | TS 26245 | +--------------+ +--------------+ | +--------------------------------------+ \/ +-------------------------------------------------------------------+ | Partition Text Samples into units. TTU[i]= TYPE i units. | | | |[U R TYPE LEN][{TOTAL,THIS}SIDX{SDUR}{TLEN}{SLEN}][SampleContents] | |{..} means present if applicable, [..] means always present | +-------------------------------------------------------------------+ \/ \/ +-------------------------------------------------------------------+ | Aggregation (if possible) | +-------------------------------------------------------------------+ \/ \/ +-------------------------------------------------------------------+ | RTP Entity adds and fills RTP header and Sends RTP packet, where | | RTP packets according to this Payload Format = | | RTP packets carrying MPEG-4 Timed Text ES over RFC 3640 | +-------------------------------------------------------------------+
Figure 17. Relation to RFC 3640
图17。与RFC 3640的关系
Note: The use of RFC 3640 for transport of ISO/IEC 14496-17 data does not require any new SDP parameters or any new mode definition.
注:使用RFC 3640传输ISO/IEC 14496-17数据不需要任何新的SDP参数或任何新的模式定义。
RFC 2793 [22] and its revision, RFC 4103 [23], specify a protocol for enabling text conversation. Typical applications of this payload format are text communication terminals and text conferencing tools. Text session contents are specified in ITU-T Recommendation T.140 [24]. T.140 text is UTF-8 coded as specified in T.140 [24] with no extra framing. The T140block contains one or more T.140 code elements as specified in T.140. Code elements are control sequences such as "New Line", "Interrupt", "String Terminator", or "Start of String". Most T.140 code elements are single ISO 10646 [25] characters, but some are multiple character sequences. Each character is UTF-8 encoded [18] into one or more octets.
RFC 2793[22]及其修订版RFC 4103[23]指定了用于启用文本对话的协议。这种有效载荷格式的典型应用是文本通信终端和文本会议工具。ITU-T建议T.140[24]中规定了文本会话内容。T.140文本按照T.140[24]中的规定进行UTF-8编码,没有额外的框架。T140块包含T.140中规定的一个或多个T.140代码元素。代码元素是控制序列,如“新行”、“中断”、“字符串终止符”或“字符串开始”。大多数T.140代码元素是单个ISO10646[25]字符,但有些是多字符序列。每个字符都经过UTF-8编码[18]为一个或多个八位字节。
This payload format may also be used for conversational applications (even for instant messaging). However, this is not its main target. The differentiating feature of 3GPP Timed Text media format is that it allows text decoration. This is especially useful in multimedia presentations, karaoke, commercial banners, news tickers, clickable text strings, and captions. T.140 text contents used in RFC 2793 do not allow the use of text decoration.
此有效负载格式也可用于会话应用程序(甚至用于即时消息)。然而,这不是它的主要目标。3GPP定时文本媒体格式的区别在于它允许文本装饰。这在多媒体演示、卡拉OK、商业横幅、新闻标签、可点击文本字符串和字幕中特别有用。T.140 RFC 2793中使用的文本内容不允许使用文本装饰。
Furthermore, the conversational text RTP payload format recommends a method to include redundant text from already transmitted packets in order to reduce the risk of text loss caused by packet loss. Thereby payloads would include a redundant copy of the last payload sent. This payload format does not describe such a method, but this is also applicable here. As explained in Section 5, packet redundancy SHOULD be used, whenever possible. The aggregation guidelines in Section 4.6 allow redundant payloads.
此外,会话文本RTP有效载荷格式推荐了一种包括来自已经传输的分组的冗余文本的方法,以减少由分组丢失引起的文本丢失的风险。因此,有效载荷将包括最后发送的有效载荷的冗余副本。该有效负载格式没有描述这种方法,但在这里也适用。如第5节所述,应尽可能使用数据包冗余。第4.6节中的聚合指南允许冗余有效载荷。
Apart from the basic fragmentation guidelines described in the section above, the simplest option for packet-loss-resilient transport is packet repetition. This mechanism may consist of a strict window-based repetition mechanism or, simply, a repetition mechanism in a wider sense, where new and old packets are mixed, for example.
除了上面一节中描述的基本碎片准则外,分组丢失弹性传输的最简单选项是分组重复。该机制可以包括严格的基于窗口的重复机制,或者简单地说,包括更广泛意义上的重复机制,例如,在新分组和旧分组混合的情况下。
A server MAY decide to use repetition as a measure for packet loss resilience. Thereby, a server MAY send the same RTP payloads or just some of the units from the payloads.
服务器可以决定使用重复作为分组丢失恢复能力的度量。因此,服务器可以发送相同的RTP有效载荷,或者仅发送来自有效载荷的部分单元。
As for the case of complete payloads, single repeated units MUST exactly match the same units sent in the first transmission; i.e., if fragmentation is needed, it SHALL be performed only once for each text sample. Only then, a receiver can use the already received and the repeated units to reconstruct the original text samples. Since the RTP timestamp is used to group together the fragments of a sample, care must taken to preserve the timing of units when constructing new RTP packets.
对于完整有效载荷的情况,单个重复单元必须与第一次传输中发送的相同单元完全匹配;i、 例如,如果需要分段,则每个文本样本只能执行一次。只有这样,接收者才能使用已经接收到的和重复的单元来重构原始文本样本。由于RTP时间戳用于将样本的片段组合在一起,因此在构造新的RTP数据包时,必须注意保持单元的计时。
For example, if a text sample was originally sent as a single non-fragmented text sample (one TYPE 1 unit), a repetition of that sample MUST be sent also as a single non-fragmented text sample in one unit. Likewise, if the original text sample was fragmented and spread over several RTP packets (say, a total of 3 units), then the repeated fragments SHALL also have the same byte boundaries and use the same unit headers and bytes per fragment.
例如,如果文本样本最初作为单个非碎片文本样本(一个类型1单元)发送,则该样本的重复也必须作为一个单元中的单个非碎片文本样本发送。同样,如果原始文本样本被分割并分布在几个RTP数据包上(例如,总共3个单元),则重复的片段也应具有相同的字节边界,并使用相同的单元头和每个片段的字节。
With repetition, repeated units resolve to the same timestamp as their originals. Where redundant units are available, only one of them SHALL be used.
通过重复,重复的单元解析为与原始单元相同的时间戳。如果冗余装置可用,则只能使用其中一个。
Regarding the RTP header fields:
关于RTP头字段:
o If the whole RTP payload is repeated, all payload-specific fields in the RTP header (the M, TS and PT fields) MUST keep their original values except the sequence number, which MUST be incremented to comply with RTP (the fields TOTAL/THIS enable to re-assemble fragments with different sequence numbers).
o 如果整个RTP有效负载被重复,RTP报头中的所有有效负载特定字段(M、TS和PT字段)必须保持其原始值,序列号除外,序列号必须增加以符合RTP(字段总数/这允许使用不同序列号重新组装片段)。
o In packets containing single repeated units, the general rules in Section 3 for assigning values to the RTP header fields apply. Keeping the value of the RTP timestamp to preserve the timing of the units is particularly relevant here.
o 在包含单个重复单元的数据包中,第3节中为RTP报头字段赋值的一般规则适用。保持RTP时间戳的值以保持单元的定时在这里特别相关。
Apart from repetition, other mechanisms such as FEC [7], retransmission [11], or similar techniques could be used to cope with packet losses.
除了重复,其他机制,如FEC[7]、重传[11]或类似技术也可用于处理数据包丢失。
Congestion control for RTP SHALL be implemented in accordance with RTP [3] and the applicable RTP profile, e.g., RTP/AVP [17].
RTP的拥塞控制应根据RTP[3]和适用的RTP配置文件(如RTP/AVP[17])实施。
When using this payload format, mainly two factors may affect the congestion control:
使用此有效负载格式时,主要有两个因素可能会影响拥塞控制:
o The use of (unit) aggregation may make the payload format more bandwidth efficient, by avoiding header overhead and thus reducing the used bitrate.
o 使用(单元)聚合可以通过避免报头开销从而降低所使用的比特率使有效负载格式更具带宽效率。
o The use of resilient transport mechanisms: Although timed text applications typically operate at low bitrates, the increase due to resilient transport shall be considered for congestion control mechanisms. This applies to all mechanisms but especially to less efficient ones like repetition.
o 弹性传输机制的使用:虽然定时文本应用程序通常以低比特率运行,但拥塞控制机制应考虑弹性传输带来的增加。这适用于所有机制,但特别是像重复这样效率较低的机制。
In order to set up a timed text session, regardless of the stream being stored in a 3GP file or streamed live, some initial layout information is needed by the communicating peers.
为了设置定时文本会话,无论流存储在3GP文件中还是流式直播中,通信对等方都需要一些初始布局信息。
+-------------------------------------------+ | <-> tx | +-------------+ | +-------------------------------+ |<---|Display Area | | ^ | | | +-------------+ | : | | | | :ty| | | +-------------+ | : | |<---------|Video track | | : | | | +-------------+ | : | | | | : | | | | : | | | | v | | | | - | x-------------------------+ | | +-------------+ |h ^ | | |<-----------|Text Track | |e : +---|-------------------------|-+ | +-------------+ |i : | +---------------------+ | | |g : | | | | | +-------------+ |h : | | |<------------ |Text Box | |t v | +---------------------+ | | +-------------+ | - +-------------------------+ | +-------------------------------------------+ <........................> w i d t h
+-------------------------------------------+ | <-> tx | +-------------+ | +-------------------------------+ |<---|Display Area | | ^ | | | +-------------+ | : | | | | :ty| | | +-------------+ | : | |<---------|Video track | | : | | | +-------------+ | : | | | | : | | | | : | | | | v | | | | - | x-------------------------+ | | +-------------+ |h ^ | | |<-----------|Text Track | |e : +---|-------------------------|-+ | +-------------+ |i : | +---------------------+ | | |g : | | | | | +-------------+ |h : | | |<------------ |Text Box | |t v | +---------------------+ | | +-------------+ | - +-------------------------+ | +-------------------------------------------+ <........................> w i d t h
Figure 18. Illustration of text rendering position and composition
图18。文本呈现位置和组成的图示
The parameters used for negotiating the position and size of the text track in the display area are shown in Figure 18. These are the "width" and "height" of the text track, its translation values, "tx" and "ty", and its "layer" or proximity to the user.
用于协商显示区域中文本轨迹的位置和大小的参数如图18所示。这些是文本轨迹的“宽度”和“高度”、其翻译值、“tx”和“ty”以及其“层”或与用户的距离。
At the same time, the sender of the stream needs to know the receiver's capabilities. In this case, the maximum allowable values for the text track height and width: "max-h" and "max-w", for the stream the receiver shall display.
同时,流的发送方需要知道接收方的能力。在这种情况下,对于接收器应显示的流,文本轨迹高度和宽度的最大允许值为:“max-h”和“max-w”。
This layout information MUST be conveyed in a reliable form before the start of the session, e.g., during session announcement or in an Offer/Answer (O/A) exchange. An example of a reliable transport may be the out-of-band channel used for SDP. Sections 8 and 9 provide
该布局信息必须在会话开始前以可靠的形式传达,例如在会话公告期间或在要约/应答(O/a)交换中。可靠传输的示例可以是用于SDP的带外信道。第8节和第9节规定:
details on the mapping of these parameters to SDP descriptions and their usage in O/A.
有关将这些参数映射到SDP描述及其在O/A中使用的详细信息。
For stored content, the layout values expressing stream properties MUST be obtained from the Track Header Box. See Section 7.3.
对于存储内容,表示流属性的布局值必须从曲目标题框中获取。见第7.3节。
For live streaming, appropriate values as negotiated during session setup shall be used.
对于直播,应使用会话设置期间协商的适当值。
The attributes contained in the Track Header Boxes of a 3GP file only specify the spatial relationship of the tracks within the given 3GP file.
3GP文件的轨迹头框中包含的属性仅指定给定3GP文件中轨迹的空间关系。
If multiple 3GP files are sent, they require spatial synchronization. For example, for a text and video stream, the positions of the text and video tracks in Figure 18 shall be determined. For this purpose, SMIL [9] MAY be used.
如果发送多个3GP文件,它们需要空间同步。例如,对于文本和视频流,应确定图18中文本和视频轨迹的位置。为此,可以使用SMIL[9]。
SMIL assigns regions in the display to each of those files and places the tracks within those regions. Generally, in SMIL, the position of one track (or stream) is expressed relative to another track. This is different from the 3GP file, where the upper left corner is the reference for all translation offsets. Hence, only if the position in SMIL is relative to the video track origin, then this translation offset has the same value as (tx, ty) in the 3GP file.
SMIL为每个文件指定显示区域,并将轨迹放置在这些区域内。通常,在SMIL中,一条轨迹(或流)的位置表示为相对于另一条轨迹的位置。这与3GP文件不同,3GP文件的左上角是所有平移偏移的参考。因此,仅当SMIL中的位置相对于视频轨迹原点时,该平移偏移量才具有与3GP文件中的(tx,ty)相同的值。
Note also that the original track header information is used for each track only within its region, as assigned by SMIL. Therefore, even if SMIL scene description is used, the track header information pieces SHOULD be sent anyway, as they represent the intrinsic media properties. See 3GPP SMIL Language Profile in [27] for details.
还请注意,原始曲目标题信息仅用于SMIL指定的每个曲目区域内的曲目。因此,即使使用SMIL场景描述,也应发送曲目标题信息,因为它们表示固有的媒体属性。有关详细信息,请参见[27]中的3GPP SMIL语言配置文件。
In a 3GP file, within the Track Header Box (tkhd):
在3GP文件中,在曲目标题框(tkhd)内:
o tx, ty: These values specify the translation offset of the (text) track relative to the upper left corner of the video track, if present. They are the second but last and third but last values in the unity matrix; values are fixed-point 16.16 values, restricted to be (signed) integers (i.e., the lower 16 bits of each value shall be all zeros). Therefore, only the first 16 bits are used for obtaining the value of the media type parameters.
o tx,ty:这些值指定(文本)轨迹相对于视频轨迹左上角的平移偏移量(如果存在)。它们是单位矩阵中的第二个但最后一个和第三个但最后一个值;值为定点16.16值,限制为(有符号)整数(即,每个值的低16位应为全零)。因此,仅前16位用于获取媒体类型参数的值。
o width, height: They have the same name in the tkhd box. All (unsigned) 32 bits are meaningful.
o 宽度、高度:它们在tkhd框中具有相同的名称。所有(无符号)32位都是有意义的。
o layer: All (signed) 16 bits are used.
o 层:使用所有(有符号)16位。
The media subtype for the 3GPP Timed Text codec is allocated from the standards tree. The top-level media type under which this payload format is registered is 'video'. This registration is done using the template defined in [29] and following RFC 3555 [28].
3GPP定时文本编解码器的媒体子类型是从标准树中分配的。注册此有效负载格式的顶级媒体类型为“视频”。使用[29]中定义的模板和RFC 3555[28]中规定的模板进行注册。
The receiver MUST ignore any unrecognized parameter.
接收器必须忽略任何无法识别的参数。
Media type: video
媒体类型:视频
Media subtype: 3gpp-tt
媒体子类型:3gpp tt
Required parameters
所需参数
rate: Refer to Section 3 in RFC 4396.
费率:参考RFC 4396中的第3节。
sver: The parameter "sver" contains a list of supported backwards-compatible versions of the timed text format specification (3GPP TS 26.245) that the sender accepts to receive (and that are the same that it would be willing to send). The first value is the value preferred to receive (or preferred to send). The first value MAY be followed by a comma-separated list of versions that SHOULD be used as alternatives. The order is meaningful, being first the most preferred and last the least preferred. Each entry has the format Zi(xi*256+yi), where "Zi" is the number of the Release and "xi" and "yi" are taken from the 3GPP specification version (i.e., vZi.xi.yi). For example, for 3GPP TS 26.245 v6.0.0, Zi(xi*256+yi)=6(0), the version value is "60". (Note that "60" is the concatenation of the values Zi=6 and (xi*256+yi)=0 and not their product.)
sver:参数“sver”包含发送方接受接收的支持向后兼容的定时文本格式规范(3GPP TS 26.245)版本列表(与它愿意发送的版本相同)。第一个值是首选接收(或首选发送)的值。第一个值后面可能会有一个逗号分隔的版本列表,这些版本应该用作备选版本。顺序是有意义的,首先是最受欢迎的,最后是最不受欢迎的。每个条目都有ZI格式(席席256 + Yi),其中“Zi”是发行的数量,“席”和“Yi”取自3GPP规范版本(即VZI,XI,YI)。例如,对于3GPP TS 26.245v6.0.0,Zi(xi*256+yi)=6(0),版本值为“60”。(请注意,“60”是值Zi=6和(xi*256+yi)=0的串联,而不是它们的乘积。)
If no "sver" value is available, for example, when streaming out of a 3GP file, the default value "60", corresponding to the 3GPP Release 6 version of 3GPP TS 26.245, SHALL be used.
如果没有可用的“sver”值,例如,当从3GP文件中流出时,应使用与3GPP TS 26.245的3GPP版本6相对应的默认值“60”。
Optional parameters:
可选参数:
tx: This parameter indicates the horizontal translation offset in pixels of the text track with respect to the origin of the video track. This value is the decimal representation of a 16-bit signed integer. Refer to TS 3GPP 26.245 for an illustration of this parameter.
tx:此参数表示文本轨迹相对于视频轨迹原点的水平平移偏移(以像素为单位)。此值是16位有符号整数的十进制表示形式。有关此参数的图示,请参阅TS 3GPP 26.245。
ty: This parameter indicates the vertical translation offset in pixels of the text track with respect to the origin of the video track. This value is the decimal representation of a 16-bit signed integer. Refer to TS 3GPP 26.245 for an illustration of this parameter.
ty:此参数表示文本轨迹相对于视频轨迹原点的垂直平移偏移(以像素为单位)。此值是16位有符号整数的十进制表示形式。有关此参数的图示,请参阅TS 3GPP 26.245。
layer: This parameter indicates the proximity of the text track to the viewer. More negative values mean closer to the viewer. This parameter has no units. This value is the decimal representation of a 16-bit signed integer.
图层:此参数表示文本轨迹与查看器的接近程度。更多负值意味着更接近观察者。此参数没有单位。此值是16位有符号整数的十进制表示形式。
tx3g: This parameter MUST be used for conveying sample descriptions out-of-band. It contains a comma-separated list of base64-encoded entries. The entries of this list MAY follow any particular order and the list SHALL NOT be empty. Each entry is the result of running base64 encoding over the concatenation of the (static) SIDX value as an 8-bit unsigned integer and the (static) sample description for that SIDX, in that order. The format of a sample description entry can be found in 3GPP TS 26.245 Release 6 and later releases. All servers and clients MUST understand this parameter and MUST be capable of using the sample description(s) contained in it. Please refer to RFC 3548 [6] for details on the base64 encoding.
tx3g:此参数必须用于将样本描述传送到带外。它包含一个以逗号分隔的base64编码项列表。此列表中的条目可以遵循任何特定顺序,且列表不得为空。每个条目都是在(静态)SIDX值作为8位无符号整数与该SIDX的(静态)样本描述按顺序串联的基础上运行base64编码的结果。示例描述条目的格式可在3GPP TS 26.245第6版和更高版本中找到。所有服务器和客户端必须理解此参数,并且必须能够使用其中包含的示例描述。有关base64编码的详细信息,请参阅RFC 3548[6]。
width: This parameter indicates the width in pixels of the text track or area of the text being sent. This value is the decimal representation of a 32-bit unsigned integer. Refer to TS 3GPP 26.245 for an illustration of this parameter.
宽度:此参数表示正在发送的文本轨迹或区域的宽度(以像素为单位)。此值是32位无符号整数的十进制表示形式。有关此参数的图示,请参阅TS 3GPP 26.245。
height: This parameter indicates the height in pixels of the text track being sent. This value is the decimal representation of a 32-bit unsigned integer. Refer to TS 3GPP 26.245 for an illustration of this parameter.
高度:此参数表示正在发送的文本轨迹的高度(以像素为单位)。此值是32位无符号整数的十进制表示形式。有关此参数的图示,请参阅TS 3GPP 26.245。
max-w: This parameter indicates display capabilities. This is the maximum "width" value that the sender of this parameter supports. This value is the decimal representation of a 32-bit unsigned integer.
max-w:此参数表示显示功能。这是此参数的发送方支持的最大“宽度”值。此值是32位无符号整数的十进制表示形式。
max-h: This parameter indicates display capabilities. This is the maximum "height" value that the sender of this parameter supports. This value is the decimal representation of a 32-bit unsigned integer.
max-h:此参数表示显示功能。这是此参数的发送方支持的最大“高度”值。此值是32位无符号整数的十进制表示形式。
Encoding considerations:
编码注意事项:
This media type is framed (see Section 4.8 in [29]) and partially contains binary data.
该媒体类型为框架(见[29]第4.8节),部分包含二进制数据。
Restrictions on usage:
使用限制:
This media type depends on RTP framing, and hence is only defined for transfer via RTP [3]. Transport within other framing protocols is not defined at this time.
此媒体类型取决于RTP帧,因此仅定义为通过RTP传输[3]。此时未定义其他帧协议内的传输。
Security considerations:
安全考虑:
Please refer to Section 11 of RFC 4396.
请参阅RFC 4396第11节。
Interoperability considerations:
互操作性注意事项:
The 3GPP Timed Text media format and its file storage is specified in Release 6 of 3GPP TS 26.245, "Transparent end-to-end packet switched streaming service (PSS); Timed Text Format (Release 6)". Note also that 3GPP may in future releases specify extensions or updates to the timed text media format in a backwards-compatible way, e.g., new modifier boxes or extensions to the sample descriptions. The payload format defined in RFC 4396 allows for such extensions. For future 3GPP Releases of the Timed Text Format, the parameter "sver" is used to identify the exact specification used.
3GPP定时文本媒体格式及其文件存储在3GPP TS 26.245的第6版“透明端到端分组交换流媒体服务(PSS);定时文本格式(第6版)”中规定。还请注意,3GPP可能在未来的版本中以向后兼容的方式指定定时文本媒体格式的扩展或更新,例如,新的修改器框或示例描述的扩展。RFC 4396中定义的有效负载格式允许此类扩展。对于未来3GPP版本的定时文本格式,参数“sver”用于确定所使用的确切规范。
The defined storage format for 3GPP Timed Text format is the 3GPP File Format (3GP) [30]. 3GP files may be transferred using the media type video/3gpp as registered by RFC 3839 [31]. The 3GPP File Format is a container file that may contain, e.g., audio and video that may be synchronized with the 3GPP Timed Text.
3GPP定时文本格式的定义存储格式是3GPP文件格式(3GP)[30]。3GP文件可以使用RFC 3839[31]注册的媒体类型video/3gpp进行传输。3GPP文件格式是容器文件,其可包含例如可与3GPP定时文本同步的音频和视频。
Published specification: RFC 4396
已发布规范:RFC 4396
Applications which use this media type:
使用此媒体类型的应用程序:
Multimedia streaming applications.
多媒体流媒体应用。
Additional information:
其他信息:
The 3GPP Timed Text media format is specified in 3GPP TS 26.245, "Transparent end-to-end packet switched streaming service (PSS); Timed Text Format (Release 6)". This document and future extensions to the 3GPP Timed Text format are publicly available at http://www.3gpp.org.
3GPP定时文本媒体格式在3GPP TS 26.245“透明端到端分组交换流媒体服务(PSS);定时文本格式(第6版)”中规定。本文档和3GPP定时文本格式的未来扩展可在http://www.3gpp.org.
Magic number(s): None.
幻数:无。
File extension(s): None.
文件扩展名:无。
Macintosh File Type Code(s): None.
Macintosh文件类型代码:无。
Person & email address to contact for further information:
联系人和电子邮件地址,以获取更多信息:
Jose Rey, jose.rey@eu.panasonic.com Yoshinori Matsui, matsui.yoshinori@jp.panasonic.com Audio/Video Transport Working Group.
何塞·雷伊,何塞。rey@eu.panasonic.com松井善纪,松井。yoshinori@jp.panasonic.com音频/视频传输工作组。
Intended usage: COMMON
预期用途:普通
Authors: Jose Rey Yoshinori Matsui
作者:Jose Rey Yoshinori Matsui
Change controller: IETF Audio/Video Transport Working Group delegated from the IESG.
变更控制员:IESG授权的IETF音频/视频传输工作组。
The information carried in the media type specification has a specific mapping to fields in SDP [4]. If SDP is used to specify sessions using this payload format, the mapping is done as follows:
媒体类型规范中包含的信息与SDP[4]中的字段有特定映射。如果使用SDP指定使用此有效负载格式的会话,则映射如下所示:
o The media type ("video") goes in the SDP "m=" as the media name.
o 媒体类型(“视频”)以SDP“m=”作为媒体名称。
m=video <port number> RTP/<RTP profile> <dynamic payload type>
m=video <port number> RTP/<RTP profile> <dynamic payload type>
o The media subtype ("3gpp-tt") and the timestamp clockrate "rate" (the RECOMMENDED 1000 Hz or other value) go in SDP "a=rtpmap" line as the encoding name and rate, respectively:
o 媒体子类型(“3gpp tt”)和时间戳clockrate“rate”(建议的1000 Hz或其他值)分别作为编码名称和速率进入SDP“a=rtpmap”行:
a=rtpmap:<payload type> 3gpp-tt/1000
a=rtpmap:<payload type> 3gpp-tt/1000
o The REQUIRED parameter "sver" goes in the SDP "a=fmtp" attribute by copying it directly from the media type string as a semicolon-separated parameter=value pair.
o 所需的参数“sver”作为分号分隔的参数=值对,直接从媒体类型字符串复制到SDP“a=fmtp”属性中。
o The OPTIONAL parameters "tx", "ty", "layer", "tx3g", "width", "height", "max-w" and "max-h" go in the SDP "a=fmtp" attribute by copying them directly from the media type string as a semicolon separated list of parameter=value(s) pairs:
o 可选参数“tx”、“ty”、“layer”、“tx3g”、“width”、“height”、“max-w”和“max-h”直接从媒体类型字符串复制到SDP“a=fmtp”属性中,以分号分隔参数=值对列表:
a=fmtp:<dynamic payload type> <parameter name>=<value>[,<value>][; <parameter name>=<value>]
a=fmtp:<dynamic payload type> <parameter name>=<value>[,<value>][; <parameter name>=<value>]
o Any parameter unknown to the device that uses the SDP SHALL be ignored. For example, parameters added to the media format in later specifications MAY be copied into the SDP and SHALL be ignored by receivers that do not understand them.
o 应忽略使用SDP的设备未知的任何参数。例如,在以后的规范中添加到媒体格式的参数可能被复制到SDP中,不理解这些参数的接收器应忽略这些参数。
In this section, the meaning of the SDP parameters defined in this document within the Offer/Answer [13] context is explained.
在本节中,将解释本文件中在报价/应答[13]上下文中定义的SDP参数的含义。
In unicast, sender and receiver typically negotiate the streams, i.e., which codecs and parameter values are used in the session. This is also possible in multicast to a lesser extent.
在单播中,发送方和接收方通常协商流,即在会话中使用哪些编解码器和参数值。这在多播中也是可能的,但程度较低。
Additionally, the meaning of the parameters MAY vary depending on which direction is used. In the following sections, a "<directionality> offer" means an offer that contains a stream set to <directionality>. <directionality> may take the values sendrecv,
此外,参数的含义可能因使用的方向而异。在以下部分中,“<directionality>offer”表示包含设置为<directionality>的流的offer<方向性>可采用sendrecv的值,
sendonly, and recvonly. Similar considerations apply for answers. For example, an answer to a sendonly offer is a recvonly answer.
sendonly和recvonly。类似的考虑也适用于答案。例如,对sendonly报价的答复就是recvonly答复。
The following types of parameters are used in this payload format:
此有效负载格式中使用了以下类型的参数:
1. Declarative parameters: Offerer and answerer declare the values they will use for the incoming (sendrecv/recvonly) or outgoing (sendonly) stream. Offerer and answerer MAY use different values.
1. 声明性参数:提供方和应答方声明它们将用于传入(sendrecv/RecVoOnly)或传出(sendonly)流的值。报价人和应答人可能使用不同的值。
a. "tx", "ty", and "layer": These are parameters describing where the received text track is placed. Depending on the directionality:
a. “tx”、“ty”和“layer”:这些参数描述接收到的文本轨迹的放置位置。根据方向性:
i. They MUST appear in all sendrecv offers and answers and in all recvonly offers and answers (thus applying to the incoming stream). In the case of sendrecv offers and answers and in recvonly offers, these values SHOULD be used by the sender of the stream unless it has a particular preference, in which case, it MUST make sure that these different values do not corrupt the presentation. For recvonly answers, the answerer MAY accept the proposed values for the incoming stream (in a sendonly offer; see ii. below) or respond with different ones. The offerer MUST use the returned values.
i. 它们必须出现在所有sendrecv报价和应答以及所有recvonly报价和应答中(从而应用于传入流)。对于sendrecv报价和应答以及recvonly报价,流的发送方应使用这些值,除非其具有特定首选项,在这种情况下,必须确保这些不同的值不会损坏演示文稿。对于回复,应答者可以接受传入流的建议值(在仅发送的报价中;见下文ii),或者使用不同的值进行响应。报价人必须使用返回的值。
ii. They MAY appear in sendonly offers and MUST appear in sendonly answers. In sendonly offers, they specify the values that the offerer proposes for sending (see example in Section 9.3). In sendonly answers, these values SHOULD be copied from the corresponding recvonly offer upon accepting the stream, unless a particular preference by the receiver of the stream exists, as explained in the previous point.
二、它们可能出现在sendonly报价中,并且必须出现在sendonly答案中。在sendonly报价中,它们指定报价人建议发送的值(见第9.3节中的示例)。在sendonly应答中,这些值应在接受流时从相应的recvonly提供中复制,除非流的接收者存在特定偏好,如前一点所述。
2. Parameters describing the display capabilities, "max-h" and "max-w", which indicate the maximum dimensions of the text track (text display area) for the incoming stream "tx" and "ty" values (see Figure 18). "max-h" and "max-w" MUST be included in all offers and answers where "tx" and "ty" refer to the incoming stream, thus excluding sendonly offers and answers (see example in Section 9.3), where they SHALL NOT be present.
2. 描述显示能力的参数“max-h”和“max-w”,表示输入流“tx”和“ty”值的文本轨迹(文本显示区域)的最大尺寸(见图18)。“max-h”和“max-w”必须包含在所有报价和应答中,其中“tx”和“ty”指的是传入流,因此不包括仅发送报价和应答(见第9.3节中的示例),因为它们不应出现。
3. Parameters describing the sent stream properties, i.e., the sender of the stream decides upon the values of these:
3. 描述发送流属性的参数,即流的发送方决定以下属性的值:
a. "width" and "height" specify the text track dimensions. They SHALL ALWAYS be present in sendrecv and sendonly offers and answers. For recvonly answers, the answerer MUST include the offered parameter values (if any) verbatim in the answer upon accepting the stream.
a. “宽度”和“高度”指定文字轨迹尺寸。他们应始终出现在sendrecv和sendonly报价和答复中。对于回复应答,应答者在接受流时必须在应答中逐字包含提供的参数值(如果有)。
b. "tx3g" contains static sample descriptions. It MAY only be present in sendrecv and sendonly offers and answers. This parameter applies to the stream that offerers or answerers send.
b. “tx3g”包含静态示例说明。它可能仅出现在sendrecv和sendonly报价和答案中。此参数适用于提供方或应答方发送的流。
4. Negotiable parameters, which MUST be agreed on. This is the case of "sver". This parameter MUST be present in every offer and answer. The answerer SHALL choose one supported value from the offerer's list, or else it MUST remove the stream or reject the session.
4. 可协商的参数,必须达成一致。这就是“sver”的情况。此参数必须出现在每个报价和应答中。应答方应从报价方列表中选择一个支持值,否则必须删除流或拒绝会话。
5. Symmetric parameters: "rate", timestamp clockrate, belongs to this class. Symmetric parameters MUST be echoed verbatim in the answer. Otherwise, the stream MUST be removed or the session rejected.
5. 对称参数:“速率”,时间戳时钟速率,属于此类。答案中必须逐字重复对称参数。否则,必须删除流或拒绝会话。
The following table summarizes all options:
下表总结了所有选项:
+..---------------------------+----------+----------+----------+ | ``--..__ Directionality/ | sendrecv | recvonly | sendonly | + Type of ``--..__ O or A +----------+----------+----------+ | Parameter ``--..__ | O/A | O/A | O/A | +--------------+------------``+----------+----------+----------+ | Declarative |tx, ty, layer | M/M | M/M | m/M | | | | | | | +--------------+--------------+----------+----------+----------+ | Display |max-h, max-w | M/M | M/M | -/- | | Capabilities | | | | | +--------------+--------------+----------+----------+----------+ | Stream |height, width | M/M | -/(M) | M/M | | properties |tx3g | m/m | -/- | m/m | | | | | | | +--------------+--------------+----------+----------+----------+ | Negotiable |sver | M/M | M/M | M/M | | | | | | | +--------------+--------------+----------+----------+----------+ | Symmetric |rate | M/M | M/M | M/M | +--------------+--------------+----------+----------+----------+
+..---------------------------+----------+----------+----------+ | ``--..__ Directionality/ | sendrecv | recvonly | sendonly | + Type of ``--..__ O or A +----------+----------+----------+ | Parameter ``--..__ | O/A | O/A | O/A | +--------------+------------``+----------+----------+----------+ | Declarative |tx, ty, layer | M/M | M/M | m/M | | | | | | | +--------------+--------------+----------+----------+----------+ | Display |max-h, max-w | M/M | M/M | -/- | | Capabilities | | | | | +--------------+--------------+----------+----------+----------+ | Stream |height, width | M/M | -/(M) | M/M | | properties |tx3g | m/m | -/- | m/m | | | | | | | +--------------+--------------+----------+----------+----------+ | Negotiable |sver | M/M | M/M | M/M | | | | | | | +--------------+--------------+----------+----------+----------+ | Symmetric |rate | M/M | M/M | M/M | +--------------+--------------+----------+----------+----------+
Table 1. Parameter usage in Unicast Offer / Answer.
表1。单播提供/应答中的参数用法。
KEY: o M means MUST be present. o m means MAY be present (such as proposed values). o (M) or (m) means MUST or MAY, if applicable. o a hyphen ("-") means the parameter MUST NOT be present.
KEY: o M means MUST be present. o m means MAY be present (such as proposed values). o (M) or (m) means MUST or MAY, if applicable. o a hyphen ("-") means the parameter MUST NOT be present.
Other observations regarding parameter usage:
关于参数使用的其他意见:
o Translation and transparency values: In sendonly offers, "tx", "ty", and "layer" indicate proposed values. This is useful for visually composed sessions where the different streams occupy different parts of the display, e.g., a video stream and the captions. These are just suggested values; the peer rendering the text ultimately decides where to place the text track.
o 平移和透明度值:在sendonly报价中,“tx”、“ty”和“layer”表示建议的值。这对于视觉合成会话非常有用,其中不同的流占据显示器的不同部分,例如视频流和字幕。这些只是建议值;呈现文本的对等体最终决定文本轨迹的放置位置。
o Text track (area) dimensions, "height" and "width": In the case of sendonly offers, an answerer accepting the offer MUST be prepared to render the stream using these values. If any of these conditions are not met, the stream MUST be removed or the session rejected.
o 文本轨迹(区域)尺寸、“高度”和“宽度”:在仅发送报价的情况下,接受报价的应答者必须准备使用这些值呈现流。如果不满足这些条件中的任何一个,则必须删除流或拒绝会话。
o Display capabilities, "max-h" and "max-w": An answerer sending a stream SHALL ensure that the "height" and "width" values in the answer are compatible with the offerer's signaled capabilities.
o 显示能力,“max-h”和“max-w”:发送流的应答者应确保应答中的“高度”和“宽度”值与报价者的信号能力兼容。
o Version handling via "sver": The idea is that offerer and answerer communicate using the same version. This is achieved by letting the answerer choose from a list of supported versions, "sver". For recvonly streams, the first value in the list is the preferred version to receive. Consequently, for sendonly (and sendrecv) streams, the first value is the one preferred for sending (and receiving). The answerer MUST choose one value and return it in the answer. Upon receiving the answer, the offerer SHALL be prepared to send (sendonly and sendrecv) and receive (recvonly and sendrecv) a stream using that version. If none of the versions in the list is supported, the stream MUST be removed or the session rejected. Note that, if alternative non-compatible versions are offered, then this SHALL be done using different payload types.
o 通过“sver”进行版本处理:其思想是报价人和应答人使用相同的版本进行沟通。这是通过让回答者从支持的版本列表中选择“sver”来实现的。对于recvonly streams,列表中的第一个值是要接收的首选版本。因此,对于sendonly(和sendrecv)流,第一个值是用于发送(和接收)的首选值。回答者必须选择一个值并在回答中返回。收到答复后,报价人应准备使用该版本发送(sendonly和sendrecv)和接收(RecVoOnly和sendrecv)流。如果不支持列表中的任何版本,则必须删除流或拒绝会话。注意,如果提供了替代的不兼容版本,则应使用不同的有效负载类型来完成。
In multicast, the parameter usage is similar to the unicast case, except as follows:
在多播中,参数用法类似于单播情况,不同之处如下:
o the parameters "tx", "ty", and "layer" in multicast offers only have meaning for sendrecv and recvonly streams. In order for all clients to have the same vision of the session, they MUST be used symmetrically.
o 多播服务中的参数“tx”、“ty”和“layer”仅对sendrecv和recvonly流有意义。为了使所有客户端对会话有相同的看法,必须对称地使用它们。
o for "height", "width", and "tx3g" (for sendrecv and sendonly), multicast offers specify which values of these parameters the participants MUST use for sending. Thus, if the stream is accepted, the answerer MUST also include them verbatim in the answer (also "tx3g", if present).
o 对于“高度”、“宽度”和“tx3g”(对于sendrecv和sendonly),多播服务指定参与者在发送时必须使用这些参数的哪些值。因此,如果流被接受,应答者还必须在应答中逐字地包含它们(如果存在,也可以是“tx3g”)。
o The capability parameters, "max-h" and "max-w", SHALL NOT be used in multicast. If the offered text track should change in size, a new offer SHALL be used instead.
o 能力参数“max-h”和“max-w”不得用于多播。如果所提供的文本音轨尺寸发生变化,则应使用新的音轨。
o Regarding version handling:
o 关于版本处理:
In the case of multicast offers, an answerer MAY accept a multicast offer as long as one of the versions listed in the "sver" is supported. Therefore, if the stream is accepted, the answerer MUST choose its preferred version, but, unlike in unicast, the offerer SHALL NOT change the offered stream to this chosen version because there may be other session participants that do support the newer extensions. Consequently, different session participants may end up using different backwards-compatible media format versions. It is RECOMMENDED that the multicast offer contains a limited number of versions, in order for all participants to have the same view of the session. This is a responsibility of the session creator. If
在多播提议的情况下,应答者可以接受多播提议,只要支持“sver”中列出的其中一个版本。因此,如果流被接受,应答者必须选择其首选版本,但是,与单播不同,发盘者不得将提供的流更改为该选择的版本,因为可能有其他会话参与者支持较新的扩展。因此,不同的会话参与者可能最终使用不同的向后兼容媒体格式版本。建议多播服务包含数量有限的版本,以便所有参与者对会话有相同的看法。这是会话创建者的责任。如果
none of the offered versions is supported, the stream SHALL be removed or the session rejected. Also in this case, if alternative non-compatible versions are offered, then this SHALL be done using different payload types.
不支持任何提供的版本,应删除流或拒绝会话。同样在这种情况下,如果提供了替代的不兼容版本,则应使用不同的有效负载类型来完成。
In these unicast O/A examples, the long lines are wrapped around. Static sample descriptions are shortened for clarity.
在这些单播O/A示例中,长线被环绕。为清晰起见,静态样本描述缩短。
For sendrecv:
对于sendrecv:
O -> A
O->A
m=video <port> RTP/AVP 98 a=rtpmap:98 3gpp-tt/1000 a=fmtp:98 tx=100; ty=100; layer=0; height=80; width=100; max-h=120; max-w=160; sver=6256,60; tx3g=81... a=sendrecv
m=video <port> RTP/AVP 98 a=rtpmap:98 3gpp-tt/1000 a=fmtp:98 tx=100; ty=100; layer=0; height=80; width=100; max-h=120; max-w=160; sver=6256,60; tx3g=81... a=sendrecv
A -> O
A->O
m=video <port> RTP/AVP 98.. a=rtpmap:98 3gpp-tt/1000 a=fmtp:98 tx=100; ty=95; layer=0; height=90; width=100; max-h=100; max-w=160; sver=60; tx3g=82... a=sendrecv
m=video <port> RTP/AVP 98.. a=rtpmap:98 3gpp-tt/1000 a=fmtp:98 tx=100; ty=95; layer=0; height=90; width=100; max-h=100; max-w=160; sver=60; tx3g=82... a=sendrecv
In this example, the offerer is telling the answerer where it will place the received stream and what is the maximum height and width allowable for the stream that it will receive. Also, it tells the answerer the dimensions of the text track for the stream sent and which sample description it shall use. It offers two versions, 6256 and 60. The answerer responds with an equivalent set of parameters for the stream it receives. In this case, the answerer's "max-h" and "max-w" are compatible with the offerer's "height" and "width". Otherwise, the answerer would have to remove this stream, and the offerer would have to issue a new offer taking the answerer's capabilities into account. This is possible only if multiple payload types are present in the initial offer so that at least one of them matches the answerer's capabilities as expressed by "max-h" and "max-w" in the negative answer. Note also that the answerer's text box dimensions fit within the maximum values signaled in the offer. Finally, the answerer chooses to use version 60 of the timed text format.
在这个例子中,发盘方告诉应答方它将把接收到的流放在哪里,以及它将接收到的流允许的最大高度和宽度是多少。此外,它还告诉应答者发送的流的文本轨迹的尺寸以及应使用的样本描述。它提供两个版本,6256和60。应答器用它接收的流的一组等效参数进行响应。在这种情况下,回答者的“最大高度”和“最大宽度”与报价人的“高度”和“宽度”一致。否则,应答者将不得不删除此流,并且发盘者必须在考虑应答者能力的情况下发布新的报价。只有当初始报价中存在多个有效载荷类型,且其中至少有一个与应答者的能力相匹配(如否定回答中的“max-h”和“max-w”)时,才有可能实现这一点。还请注意,应答者的文本框尺寸符合报价中显示的最大值。最后,回答者选择使用60版定时文本格式。
For recvonly:
请注意:
Offerer -> Answerer
报价人->应答人
m=video <port> RTP/AVP 98 a=rtpmap:98 3gpp-tt/1000 a=fmtp:98 tx=100; ty=100; layer=0; max-h=120; max-w=160; sver=6256,60 a=recvonly
m=video <port> RTP/AVP 98 a=rtpmap:98 3gpp-tt/1000 a=fmtp:98 tx=100; ty=100; layer=0; max-h=120; max-w=160; sver=6256,60 a=recvonly
A -> O
A->O
m=video <port> RTP/AVP 98.. a=rtpmap:98 3gpp-tt/1000 a=fmtp:98 tx=100; ty=100; layer=0; height=90; width=100; sver=60; tx3g=82... a=sendonly
m=video <port> RTP/AVP 98.. a=rtpmap:98 3gpp-tt/1000 a=fmtp:98 tx=100; ty=100; layer=0; height=90; width=100; sver=60; tx3g=82... a=sendonly
In this case, the offer is different from the previous case: It does not include the stream properties "height", "width", and "tx3g". The answerer copies the "tx", "ty", and "layer" values, thus acknowledging these. "max-h" and "max-w" are not present in the answer because the "tx" and "ty" (and "layer") in this special case do not apply to the received stream, but to the sent stream. Also, if offerer and answerer had very different display sizes, it would not be possible to express the answerer's capabilities. In the example above and for an answerer with a 50x50 display, the translation values are already out of range.
在这种情况下,报价与前一种情况不同:它不包括流属性“高度”、“宽度”和“tx3g”。应答者复制“tx”、“ty”和“layer”值,从而确认这些值。答案中不存在“max-h”和“max-w”,因为在这种特殊情况下,“tx”和“ty”(和“layer”)不适用于接收流,而是适用于发送流。此外,如果报价人和应答人的显示尺寸非常不同,则无法表达应答人的能力。在上面的示例中,对于显示为50x50的应答器,转换值已经超出范围。
For sendonly:
仅限发送:
O -> A
O->A
m=video <port> RTP/AVP 98 a=rtpmap:98 3gpp-tt/1000 a=fmtp:98 tx=100; ty=100; layer=0; height=80; width=100; sver=6256,60; tx3g=81... a=sendonly
m=video <port> RTP/AVP 98 a=rtpmap:98 3gpp-tt/1000 a=fmtp:98 tx=100; ty=100; layer=0; height=80; width=100; sver=6256,60; tx3g=81... a=sendonly
A -> O
A->O
m=video <port> RTP/AVP 98.. a=rtpmap:98 3gpp-tt/1000 a=fmtp:98 tx=100; ty=100; layer=0; height=80; width=100; max-h=100; max-w=160; sver=60 a=recvonly
m=video <port> RTP/AVP 98.. a=rtpmap:98 3gpp-tt/1000 a=fmtp:98 tx=100; ty=100; layer=0; height=80; width=100; max-h=100; max-w=160; sver=60 a=recvonly
Note that "max-h" and "max-w" are not present in the offer. Also, with this answer, the answerer would accept the offer as is (thus echoing "tx", "ty", "height", "width", and "layer") and additionally inform the offerer about its capabilities: "max-h" and "max-w".
请注意,“max-h”和“max-w”不在报价中。此外,有了这个答案,回答者将按原样接受报价(从而呼应“tx”、“ty”、“高度”、“宽度”和“层”),并另外告知报价者其能力:“max-h”和“max-w”。
Another possible answer for this case would be:
对于这种情况,另一个可能的答案是:
A -> O
A->O
m=video <port> RTP/AVP 98.. a=rtpmap:98 3gpp-tt/1000 a=fmtp:98 tx=120; ty=105; layer=0; max-h=95; max-w=150; sver=60 a=recvonly
m=video <port> RTP/AVP 98.. a=rtpmap:98 3gpp-tt/1000 a=fmtp:98 tx=120; ty=105; layer=0; max-h=95; max-w=150; sver=60 a=recvonly
In this case, the answerer does not accept the values offered. The offerer MUST use these values or else remove the stream.
在这种情况下,回答者不接受提供的值。报价人必须使用这些值,否则将删除流。
SDP may also be employed outside of the Offer/Answer context, for instance for multimedia sessions that are announced through the Session Announcement Protocol (SAP) [14] or streamed through the Real Time Streaming Protocol (RTSP) [15].
SDP也可在提供/应答上下文之外使用,例如用于通过会话公告协议(SAP)[14]宣布的多媒体会话或通过实时流协议(RTSP)[15]流传输的多媒体会话。
In this case, the receiver of a session description is required to support the parameters and given values for the streams, or else it MUST reject the session. It is the responsibility of the sender (or creator) of the session descriptions to define the session parameters so that the probability of unsuccessful session setup is minimized. This is out of the scope of this document.
在这种情况下,会话描述的接收方需要支持流的参数和给定值,否则它必须拒绝会话。会话描述的发送者(或创建者)有责任定义会话参数,以便将会话设置失败的概率降至最低。这超出了本文件的范围。
IANA has registered the media subtype name "3gpp-tt" for the media type "video" as specified in Section 8 of this document.
IANA已按照本文件第8节的规定,为媒体类型“视频”注册了媒体子类型名称“3gpp tt”。
RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [3] and any applicable RTP profile, e.g., AVP [17].
使用本规范中定义的有效负载格式的RTP数据包受RTP规范[3]和任何适用RTP配置文件(如AVP[17])中讨论的安全注意事项的约束。
In particular, an attacker may invalidate the current set of active sample descriptions at the client by means of repeating a packet with an old sample description, i.e., replay attack. This would mean that the display of the text would be corrupted, if displayed at all. Another form of attack may consist of sending redundant fragments, whose boundaries do not match the exact boundaries of the originals
特别是,攻击者可以通过重复具有旧样本描述的数据包(即重放攻击),使客户端的当前活动样本描述集无效。这意味着文本的显示将被破坏,如果显示的话。另一种形式的攻击可能包括发送冗余碎片,这些碎片的边界与原始碎片的精确边界不匹配
(as indicated by LEN) or fragments that carry different sample lengths (SLEN). This may cause a decoder to crash.
(如LEN所示)或携带不同样本长度(SLEN)的碎片。这可能会导致解码器崩溃。
These types of attack may easily be avoided by using source authentication and integrity protection.
通过使用源身份验证和完整性保护,可以轻松避免这些类型的攻击。
Additionally, peers in a timed text session may desire to retain privacy in their communication, i.e., confidentiality.
此外,定时文本会话中的对等方可能希望在其通信中保留隐私,即保密性。
This payload format does not provide any mechanisms for achieving these. Confidentiality, integrity protection, and authentication have to be solved by a mechanism external to this payload format, e.g., SRTP [10].
此有效负载格式不提供实现这些目标的任何机制。机密性、完整性保护和身份验证必须通过该有效负载格式之外的机制来解决,例如SRTP[10]。
[1] Transparent end-to-end packet switched streaming service (PSS); Timed Text Format (Release 6), TS 26.245 v 6.0.0, June 2004.
[1] 透明端到端分组交换流媒体服务(PSS);定时文本格式(第6版),TS 26.245 v 6.0.0,2004年6月。
[2] ISO/IEC 14496-12:2004 Information technology - Coding of audio-visual objects - Part 12: ISO base media file format.
[2] ISO/IEC 14496-12:2004信息技术-视听对象编码-第12部分:ISO基本媒体文件格式。
[3] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003.
[3] Schulzrinne,H.,Casner,S.,Frederick,R.,和V.Jacobson,“RTP:实时应用的传输协议”,STD 64,RFC 35502003年7月。
[4] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998.
[4] Handley,M.和V.Jacobson,“SDP:会话描述协议”,RFC 2327,1998年4月。
[5] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[5] Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。
[6] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 3548, July 2003.
[6] Josefsson,S.,“Base16、Base32和Base64数据编码”,RFC3548,2003年7月。
[7] Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for Generic Forward Error Correction", RFC 2733, December 1999.
[7] Rosenberg,J.和H.Schulzrinne,“通用前向纠错的RTP有效载荷格式”,RFC 2733,1999年12月。
[8] Perkins, C. and O. Hodson, "Options for Repair of Streaming Media", RFC 2354, June 1998.
[8] Perkins,C.和O.Hodson,“修复流媒体的选项”,RFC 2354,1998年6月。
[9] W3C, "Synchronised Multimedia Integration Language (SMIL 2.0)", August, 2001.
[9] W3C,“同步多媒体集成语言(SMIL 2.0)”,2001年8月。
[10] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004.
[10] Baugher,M.,McGrew,D.,Naslund,M.,Carrara,E.,和K.Norrman,“安全实时传输协议(SRTP)”,RFC 37112004年3月。
[11] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. Hakenberg, "RTP Retransmission Payload Format", Work in Progress, September 2005.
[11] Rey,J.,Leon,D.,Miyazaki,A.,Varsa,V.,和R.Hakenberg,“RTP重传有效载荷格式”,正在进行的工作,2005年9月。
[12] van der Meer, J., Mackie, D., Swaminathan, V., Singer, D., and P. Gentric, "RTP Payload Format for Transport of MPEG-4 Elementary Streams", RFC 3640, November 2003.
[12] van der Meer,J.,Mackie,D.,Swaminathan,V.,Singer,D.,和P.Gentric,“MPEG-4基本流传输的RTP有效载荷格式”,RFC 36402003年11月。
[13] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002.
[13] Rosenberg,J.和H.Schulzrinne,“具有会话描述协议(SDP)的提供/应答模型”,RFC 3264,2002年6月。
[14] Handley, M., Perkins, C., and E. Whelan, "Session Announcement Protocol", RFC 2974, October 2000.
[14] Handley,M.,Perkins,C.,和E.Whelan,“会话公告协议”,RFC 29742000年10月。
[15] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming Protocol (RTSP)", RFC 2326, April 1998.
[15] Schulzrinne,H.,Rao,A.,和R.Lanphier,“实时流协议(RTSP)”,RFC2326,1998年4月。
[16] Transparent end-to-end packet switched streaming service (PSS); Protocols and codecs (Release 6), TS 26.234 v 6.1.0, September 2004.
[16] 透明端到端分组交换流媒体服务(PSS);协议和编解码器(第6版),TS 26.234 v 6.1.0,2004年9月。
[17] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video Conferences with Minimal Control", STD 65, RFC 3551, July 2003.
[17] Schulzrinne,H.和S.Casner,“具有最小控制的音频和视频会议的RTP配置文件”,STD 65,RFC 3551,2003年7月。
[18] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, November 2003.
[18] Yergeau,F.,“UTF-8,ISO 10646的转换格式”,STD 63,RFC 3629,2003年11月。
[19] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 10646", RFC 2781, February 2000.
[19] Hoffman,P.和F.Yergeau,“UTF-16,ISO 10646编码”,RFC 2781,2000年2月。
[20] Friedman, T., Caceres, R., and A. Clark, "RTP Control Protocol Extended Reports (RTCP XR)", RFC 3611, November 2003.
[20] Friedman,T.,Caceres,R.,和A.Clark,“RTP控制协议扩展报告(RTCP XR)”,RFC 36112003年11月。
[21] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, "Extended RTP Profile for RTCP-based Feedback (RTP/AVPF)", Work in Progress, August 2004.
[21] Ott,J.,Wenger,S.,Sato,N.,Burmeister,C.,和J.Rey,“基于RTCP的反馈(RTP/AVPF)的扩展RTP配置文件”,正在进行的工作,2004年8月。
[22] Hellstrom, G., "RTP Payload for Text Conversation", RFC 2793, May 2000.
[22] Hellstrom,G.,“文本对话的RTP有效载荷”,RFC 2793,2000年5月。
[23] Hellstrom, G. and P. Jones, "RTP Payload for Text Conversation", RFC 4103, June 2005.
[23] Hellstrom,G.和P.Jones,“文本对话的RTP有效载荷”,RFC 4103,2005年6月。
[24] ITU-T Recommendation T.140 (1998) - Text conversation protocol for multimedia application, with amendment 1, (2000).
[24] ITU-T建议T.140(1998)-多媒体应用的文本对话协议,修订件1,(2000年)。
[25] ISO/IEC 10646-1: (1993), Universal Multiple Octet Coded Character Set.
[25] ISO/IEC 10646-1:(1993),通用多八位编码字符集。
[26] ISO/IEC FCD 14496-17 Information technology - Coding of audio-visual objects - Part 17: Streaming text format, Work in progress, June 2004.
[26] ISO/IEC FCD 14496-17信息技术-视听对象编码-第17部分:流式文本格式,正在进行的工作,2004年6月。
[27] Transparent end-to-end Packet-switched Streaming Service (PSS); 3GPP SMIL language profile, (Release 6), TS 26.246 v 6.0.0, June 2004.
[27] 透明端到端分组交换流媒体服务(PSS);3GPP SMIL语言简介(第6版),TS 26.246 v 6.0.012004年6月。
[28] Casner, S. and P. Hoschka, "MIME Type Registration of RTP Payload Formats", RFC 3555, July 2003.
[28] Casner,S.和P.Hoschka,“RTP有效载荷格式的MIME类型注册”,RFC 3555,2003年7月。
[29] Freed, N. and J. Klensin, "Media Type Specifications and Registration Procedures", BCP 13, RFC 4288, December 2005.
[29] Freed,N.和J.Klensin,“介质类型规范和注册程序”,BCP 13,RFC 4288,2005年12月。
[30] Transparent end-to-end packet switched streaming service (PSS); 3GPP file format (3GP) (Release 6), TS 26.244 V6.3. March 2005.
[30] 透明端到端分组交换流媒体服务(PSS);3GPP文件格式(3GP)(第6版),TS 26.244 V6.3。2005年3月。
[31] Castagno, R. and D. Singer, "MIME Type Registrations for 3rd Generation Partnership Project (3GPP) Multimedia files", RFC 3839, July 2004.
[31] Castagno,R.和D.Singer,“第三代合作伙伴项目(3GPP)多媒体文件的MIME类型注册”,RFC 3839,2004年7月。
This section provides a coarse overview of the 3GP file structure, which follows the ISO Base Media file Format [2].
本节简要介绍3GP文件结构,该结构遵循ISO基本媒体文件格式[2]。
Each 3GP file consists of "Boxes". In general, a 3GP file contains the File Type Box (ftyp), the Movie Box (moov), and the Media Data Box (mdat). The File Type Box identifies the type and properties of the 3GP file itself. The Movie Box and the Media Data Box, serving as containers, include their own boxes for each media. Boxes start with a header, which indicates both size and type (these fields are called, namely, "size" and "type"). Additionally, each box type may include a number of boxes.
每个3GP文件由“框”组成。通常,3GP文件包含文件类型框(ftyp)、电影框(moov)和媒体数据框(mdat)。文件类型框标识3GP文件本身的类型和属性。电影盒和媒体数据盒作为容器,包含各自的媒体盒。框以标头开头,标头指示大小和类型(这些字段称为“大小”和“类型”)。此外,每种盒子类型可包括多个盒子。
In the following, only those boxes are mentioned that are useful for the purposes of this payload format.
在下文中,仅提及对该有效负载格式有用的框。
The Movie Box (moov) contains one or more Track Boxes (trak), which include information about each track. A Track Box contains, among others, the Track Header Box (tkhd), the Media Header Box (mdhd), and the Media Information Box (minf).
电影盒(moov)包含一个或多个音轨盒(trak),其中包含关于每个音轨的信息。音轨盒包括音轨头盒(tkhd)、媒体头盒(mdhd)和媒体信息盒(minf)。
The Track Header Box specifies the characteristics of a single track, where a track is, in this case, the streamed text during a session. Exactly one Track Header Box is present for a track. It contains information about the track, such as the spatial layout (width and height), the video transformation matrix, and the layer number. Since these pieces of information are essential and static (i.e., constant) for the duration of the session, they must be sent prior to the transmission of any text samples.
曲目标题框指定单个曲目的特征,在本例中,曲目是会话期间的流式文本。一个轨迹只存在一个轨迹标题框。它包含有关轨迹的信息,例如空间布局(宽度和高度)、视频变换矩阵和层号。由于这些信息在会话期间是基本的和静态的(即恒定的),因此必须在传输任何文本样本之前发送它们。
The Media Header Box contains the "timescale" or number of time units that pass in one second, i.e., cycles per second or Hertz. The Media Information Box includes the Sample Table Box (stbl), which contains all the time and data indexing of the media samples in a track. Using this box, it is possible to locate samples in time and to determine their type, size, container, and offset into that container. Inside the Sample Table Box, we can find the Sample Description Box (stsd, for finding sample descriptions), the Decoding Time to Sample Box (stts, for finding sample duration), the Sample Size Box (stsz), and the Sample to Chunk Box (stsc, for finding the sample description index).
媒体标题框包含“时间刻度”或每秒经过的时间单位数,即每秒周期数或赫兹数。媒体信息框包括示例表框(stbl),其中包含曲目中媒体示例的所有时间和数据索引。使用此框,可以及时定位样本,并确定其类型、大小、容器以及在该容器中的偏移量。在样本表框内,我们可以找到样本描述框(stsd,用于查找样本描述)、解码到样本时间框(stts,用于查找样本持续时间)、样本大小框(stsz)和样本到块框(stsc,用于查找样本描述索引)。
Finally, the Media Data Box contains the media data itself. In timed text tracks, this box contains text samples. Its equivalent to audio and video is audio and video frames, respectively. The text sample consists of the text length, the text string, and one or several Modifier Boxes. The text length is the size of the text in bytes.
最后,媒体数据框包含媒体数据本身。在定时文本轨迹中,此框包含文本示例。它与音频和视频的等价物分别是音频和视频帧。文本示例由文本长度、文本字符串和一个或多个修改器框组成。文本长度是以字节为单位的文本大小。
The text string is plain text to render. The Modifier Box is information to render in addition to the text, such as color, font, etc.
文本字符串是要呈现的纯文本。修改器框是除文本外要渲染的信息,例如颜色、字体等。
The authors would like to thank Dave Singer, Jan van der Meer, Magnus Westerlund, and Colin Perkins for their comments and suggestions about this document.
作者感谢Dave Singer、Jan van der Meer、Magnus Westerlund和Colin Perkins对本文件的评论和建议。
The authors would also like to thank Markus Gebhard for the free and publicly available JavE ASCII Editor (used for the ASCII drawings in this document) and Henrik Levkowetz for the Idnits web service.
作者还要感谢Markus Gebhard提供的免费和公开的JavE ASCII编辑器(用于本文档中的ASCII图形),以及Henrik Levkowetz提供的Idnits web服务。
Authors' Addresses
作者地址
Jose Rey Panasonic R&D Center Germany GmbH Monzastr. 4c D-63225 Langen, Germany
Jose Rey Panasonic研发中心德国有限公司Monzastr。4c D-63225兰根,德国
EMail: jose.rey@eu.panasonic.com Phone: +49-6103-766-134 Fax: +49-6103-766-166
EMail: jose.rey@eu.panasonic.com Phone: +49-6103-766-134 Fax: +49-6103-766-166
Yoshinori Matsui Matsushita Electric Industrial Co., LTD. 1006 Kadoma Kadoma-shi, Osaka, Japan
日本大阪嘉道理市嘉道理松下电器工业有限公司
EMail: matsui.yoshinori@jp.panasonic.com Phone: +81 6 6900 9689 Fax: +81 6 6900 9699
EMail: matsui.yoshinori@jp.panasonic.com Phone: +81 6 6900 9689 Fax: +81 6 6900 9699
Full Copyright Statement
完整版权声明
Copyright (C) The Internet Society (2006).
版权所有(C)互联网协会(2006年)。
This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.
本文件受BCP 78中包含的权利、许可和限制的约束,除其中规定外,作者保留其所有权利。
This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
本文件及其包含的信息是按“原样”提供的,贡献者、他/她所代表或赞助的组织(如有)、互联网协会和互联网工程任务组不承担任何明示或暗示的担保,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。
Intellectual Property
知识产权
The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.
IETF对可能声称与本文件所述技术的实施或使用有关的任何知识产权或其他权利的有效性或范围,或此类权利下的任何许可可能或可能不可用的程度,不采取任何立场;它也不表示它已作出任何独立努力来确定任何此类权利。有关RFC文件中权利的程序信息,请参见BCP 78和BCP 79。
Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.
向IETF秘书处披露的知识产权信息的副本以及许可证的任何保证,或者,本规范的实施者或用户试图获得使用此类专有权利的一般许可或许可的结果,可从IETF在线知识产权存储库获取,网址为http://www.ietf.org/ipr. IETF邀请任何相关方提请其注意任何版权、专利或专利申请,或其他可能涵盖实施本标准所需技术的专有权利。请将信息发送至IETF的IETF-ipr@ietf.org.
Acknowledgement
确认
Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA).
RFC编辑器功能的资金由IETF行政支持活动(IASA)提供。