Internet Engineering Task Force (IETF)                     T. Terriberry
Request for Comments: 7845                           Mozilla Corporation
Updates: 5334                                                     R. Lee
Category: Standards Track                                    Voicetronix
ISSN: 2070-1721                                                 R. Giles
                                                     Mozilla Corporation
                                                              April 2016
        
Internet Engineering Task Force (IETF)                     T. Terriberry
Request for Comments: 7845                           Mozilla Corporation
Updates: 5334                                                     R. Lee
Category: Standards Track                                    Voicetronix
ISSN: 2070-1721                                                 R. Giles
                                                     Mozilla Corporation
                                                              April 2016
        

Ogg Encapsulation for the Opus Audio Codec

Opus音频编解码器的Ogg封装

Abstract

摘要

This document defines the Ogg encapsulation for the Opus interactive speech and audio codec. This allows data encoded in the Opus format to be stored in an Ogg logical bitstream.

本文档定义了Opus交互式语音和音频编解码器的Ogg封装。这允许以Opus格式编码的数据存储在Ogg逻辑位流中。

Status of This Memo

关于下段备忘

This is an Internet Standards Track document.

这是一份互联网标准跟踪文件。

This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741.

本文件是互联网工程任务组(IETF)的产品。它代表了IETF社区的共识。它已经接受了公众审查,并已被互联网工程指导小组(IESG)批准出版。有关互联网标准的更多信息,请参见RFC 5741第2节。

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc7845.

有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc7845.

Copyright Notice

版权公告

Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved.

版权所有(c)2016 IETF信托基金和确定为文件作者的人员。版权所有。

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。

Table of Contents

目录

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Packet Organization . . . . . . . . . . . . . . . . . . . . .   4
   4.  Granule Position  . . . . . . . . . . . . . . . . . . . . . .   6
     4.1.  Repairing Gaps in Real-Time Streams . . . . . . . . . . .   7
     4.2.  Pre-skip  . . . . . . . . . . . . . . . . . . . . . . . .   9
     4.3.  PCM Sample Position . . . . . . . . . . . . . . . . . . .   9
     4.4.  End Trimming  . . . . . . . . . . . . . . . . . . . . . .  10
     4.5.  Restrictions on the Initial Granule Position  . . . . . .  10
     4.6.  Seeking and Pre-roll  . . . . . . . . . . . . . . . . . .  11
   5.  Header Packets  . . . . . . . . . . . . . . . . . . . . . . .  12
     5.1.  Identification Header . . . . . . . . . . . . . . . . . .  12
       5.1.1.  Channel Mapping . . . . . . . . . . . . . . . . . . .  16
     5.2.  Comment Header  . . . . . . . . . . . . . . . . . . . . .  22
       5.2.1.  Tag Definitions . . . . . . . . . . . . . . . . . . .  25
   6.  Packet Size Limits  . . . . . . . . . . . . . . . . . . . . .  26
   7.  Encoder Guidelines  . . . . . . . . . . . . . . . . . . . . .  27
     7.1.  LPC Extrapolation . . . . . . . . . . . . . . . . . . . .  28
     7.2.  Continuous Chaining . . . . . . . . . . . . . . . . . . .  28
   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  29
   9.  Content Type  . . . . . . . . . . . . . . . . . . . . . . . .  30
   10. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  31
   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  32
     11.1.  Normative References . . . . . . . . . . . . . . . . . .  32
     11.2.  Informative References . . . . . . . . . . . . . . . . .  33
   Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . .  34
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  35
        
   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Packet Organization . . . . . . . . . . . . . . . . . . . . .   4
   4.  Granule Position  . . . . . . . . . . . . . . . . . . . . . .   6
     4.1.  Repairing Gaps in Real-Time Streams . . . . . . . . . . .   7
     4.2.  Pre-skip  . . . . . . . . . . . . . . . . . . . . . . . .   9
     4.3.  PCM Sample Position . . . . . . . . . . . . . . . . . . .   9
     4.4.  End Trimming  . . . . . . . . . . . . . . . . . . . . . .  10
     4.5.  Restrictions on the Initial Granule Position  . . . . . .  10
     4.6.  Seeking and Pre-roll  . . . . . . . . . . . . . . . . . .  11
   5.  Header Packets  . . . . . . . . . . . . . . . . . . . . . . .  12
     5.1.  Identification Header . . . . . . . . . . . . . . . . . .  12
       5.1.1.  Channel Mapping . . . . . . . . . . . . . . . . . . .  16
     5.2.  Comment Header  . . . . . . . . . . . . . . . . . . . . .  22
       5.2.1.  Tag Definitions . . . . . . . . . . . . . . . . . . .  25
   6.  Packet Size Limits  . . . . . . . . . . . . . . . . . . . . .  26
   7.  Encoder Guidelines  . . . . . . . . . . . . . . . . . . . . .  27
     7.1.  LPC Extrapolation . . . . . . . . . . . . . . . . . . . .  28
     7.2.  Continuous Chaining . . . . . . . . . . . . . . . . . . .  28
   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  29
   9.  Content Type  . . . . . . . . . . . . . . . . . . . . . . . .  30
   10. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  31
   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  32
     11.1.  Normative References . . . . . . . . . . . . . . . . . .  32
     11.2.  Informative References . . . . . . . . . . . . . . . . .  33
   Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . .  34
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  35
        
1. Introduction
1. 介绍

The IETF Opus codec is a low-latency audio codec optimized for both voice and general-purpose audio. See [RFC6716] for technical details. This document defines the encapsulation of Opus in a continuous, logical Ogg bitstream [RFC3533]. Ogg encapsulation provides Opus with a long-term storage format supporting all of the essential features, including metadata, fast and accurate seeking, corruption detection, recapture after errors, low overhead, and the ability to multiplex Opus with other codecs (including video) with minimal buffering. It also provides a live streamable format capable of delivery over a reliable stream-oriented transport, without requiring all the data (or even the total length of the data) up-front, in a form that is identical to the on-disk storage format.

IETF Opus编解码器是一种低延迟音频编解码器,针对语音和通用音频进行了优化。有关技术详细信息,请参见[RFC6716]。本文档定义了Opus在连续逻辑Ogg比特流中的封装[RFC3533]。Ogg封装为OPU提供了一种长期存储格式,支持所有基本功能,包括元数据、快速准确的查找、损坏检测、错误后重新捕获、低开销,以及以最小缓冲将OPU与其他编解码器(包括视频)复用的能力。它还提供了一种实时可流格式,能够通过可靠的面向流的传输进行传输,而不需要预先以与磁盘存储格式相同的形式提供所有数据(甚至数据的总长度)。

Ogg bitstreams are made up of a series of "pages", each of which contains data from one or more "packets". Pages are the fundamental unit of multiplexing in an Ogg stream. Each page is associated with

Ogg比特流由一系列“页面”组成,每个页面包含来自一个或多个“数据包”的数据。页面是Ogg流中多路复用的基本单元。每个页面都与

a particular logical stream and contains a capture pattern and checksum, flags to mark the beginning and end of the logical stream, and a "granule position" that represents an absolute position in the stream, to aid seeking. A single page can contain up to 65,025 octets of packet data from up to 255 different packets. Packets can be split arbitrarily across pages and continued from one page to the next (allowing packets much larger than would fit on a single page). Each page contains "lacing values" that indicate how the data is partitioned into packets, allowing a demultiplexer (demuxer) to recover the packet boundaries without examining the encoded data. A packet is said to "complete" on a page when the page contains the final lacing value corresponding to that packet.

一个特定的逻辑流,包含捕获模式和校验和、标记逻辑流开始和结束的标志,以及表示流中绝对位置的“颗粒位置”,以帮助查找。一个页面可以包含多达255个不同数据包中多达65025个八位字节的数据包数据。数据包可以在页面之间任意分割,并从一个页面继续到下一个页面(允许数据包比单个页面大得多)。每个页面都包含指示数据如何划分为数据包的“lacing值”,允许解复用器(demuxer)在不检查编码数据的情况下恢复数据包边界。当页面包含对应于该数据包的最终系带值时,该数据包在页面上被称为“完成”。

This encapsulation defines the contents of the packet data, including the necessary headers, the organization of those packets into a logical stream, and the interpretation of the codec-specific granule position field. It does not attempt to describe or specify the existing Ogg container format. Readers unfamiliar with the basic concepts mentioned above are encouraged to review the details in [RFC3533].

这种封装定义了数据包数据的内容,包括必要的报头、将这些数据包组织成逻辑流以及特定于编解码器的颗粒位置字段的解释。它不会试图描述或指定现有的Ogg容器格式。不熟悉上述基本概念的读者请查看[RFC3533]中的详细信息。

2. Terminology
2. 术语

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

本文件中的关键词“必须”、“不得”、“必需”、“应”、“不应”、“建议”、“不建议”、“可”和“可选”应按照[RFC2119]中的说明进行解释。

3. Packet Organization
3. 分组组织

An Ogg Opus stream is organized as follows (see Figure 1 for an example).

Ogg Opus流的组织如下(参见图1中的示例)。

        Page 0         Pages 1 ... n        Pages (n+1) ...
     +------------+ +---+ +---+ ... +---+ +-----------+ +---------+ +--
     |            | |   | |   |     |   | |           | |         | |
     |+----------+| |+-----------------+| |+-------------------+ +-----
     |||ID Header|| ||  Comment Header || ||Audio Data Packet 1| | ...
     |+----------+| |+-----------------+| |+-------------------+ +-----
     |            | |   | |   |     |   | |           | |         | |
     +------------+ +---+ +---+ ... +---+ +-----------+ +---------+ +--
     ^      ^                           ^
     |      |                           |
     |      |                           Mandatory Page Break
     |      |
     |      ID header is contained on a single page
     |
     'Beginning Of Stream'
        
        Page 0         Pages 1 ... n        Pages (n+1) ...
     +------------+ +---+ +---+ ... +---+ +-----------+ +---------+ +--
     |            | |   | |   |     |   | |           | |         | |
     |+----------+| |+-----------------+| |+-------------------+ +-----
     |||ID Header|| ||  Comment Header || ||Audio Data Packet 1| | ...
     |+----------+| |+-----------------+| |+-------------------+ +-----
     |            | |   | |   |     |   | |           | |         | |
     +------------+ +---+ +---+ ... +---+ +-----------+ +---------+ +--
     ^      ^                           ^
     |      |                           |
     |      |                           Mandatory Page Break
     |      |
     |      ID header is contained on a single page
     |
     'Beginning Of Stream'
        

Figure 1: Example Packet Organization for a Logical Ogg Opus Stream

图1:逻辑Ogg Opus流的示例数据包组织

There are two mandatory header packets. The first packet in the logical Ogg bitstream MUST contain the identification (ID) header, which uniquely identifies a stream as Opus audio. The format of this header is defined in Section 5.1. It is placed alone (without any other packet data) on the first page of the logical Ogg bitstream and completes on that page. This page has its 'beginning of stream' flag set.

有两个必需的头数据包。逻辑Ogg比特流中的第一个数据包必须包含标识(ID)报头,该报头将流唯一标识为Opus音频。第5.1节定义了该标题的格式。它单独放置在逻辑Ogg比特流的第一页上(没有任何其他数据包),并在该页上完成。此页面设置了“流的开始”标志。

The second packet in the logical Ogg bitstream MUST contain the comment header, which contains user-supplied metadata. The format of this header is defined in Section 5.2. It MAY span multiple pages, beginning on the second page of the logical stream. However many pages it spans, the comment header packet MUST finish the page on which it completes.

逻辑Ogg比特流中的第二个数据包必须包含注释头,其中包含用户提供的元数据。第5.2节定义了该标题的格式。它可以跨越多个页面,从逻辑流的第二页开始。不管它跨越多少个页面,注释头数据包必须完成它完成的页面。

All subsequent pages are audio data pages, and the Ogg packets they contain are audio data packets. Each audio data packet contains one Opus packet for each of N different streams, where N is typically one for mono or stereo, but MAY be greater than one for multichannel audio. The value N is specified in the ID header (see Section 5.1.1), and is fixed over the entire length of the logical Ogg bitstream.

所有后续页面都是音频数据页面,它们包含的Ogg数据包都是音频数据包。每个音频数据包对于N个不同流中的每一个包含一个Opus包,其中N对于单声道或立体声通常是一个,但对于多声道音频可能大于一个。值N在ID头中指定(见第5.1.1节),并在逻辑Ogg比特流的整个长度上固定。

The first (N - 1) Opus packets, if any, are packed one after another into the Ogg packet, using the self-delimiting framing from Appendix B of [RFC6716]. The remaining Opus packet is packed at the end of the Ogg packet using the regular, undelimited framing from Section 3 of [RFC6716]. All of the Opus packets in a single Ogg packet MUST be constrained to have the same duration. An implementation of this specification SHOULD treat any Opus packet whose duration is different from that of the first Opus packet in an Ogg packet as if it were a malformed Opus packet with an invalid Table Of Contents (TOC) sequence.

使用[RFC6716]附录B中的自定界帧,将第一个(N-1)Opus数据包(如有)依次打包到Ogg数据包中。剩余的Opus数据包使用[RFC6716]第3节中的常规、无限制帧打包在Ogg数据包的末尾。必须限制单个Ogg数据包中的所有Opus数据包具有相同的持续时间。本规范的实施应将其持续时间与Ogg数据包中第一个Opus数据包的持续时间不同的任何Opus数据包视为具有无效目录(TOC)序列的格式错误的Opus数据包。

The TOC sequence at the beginning of each Opus packet indicates the coding mode, audio bandwidth, channel count, duration (frame size), and number of frames per packet, as described in Section 3.1 of [RFC6716]. The coding mode is one of SILK, Hybrid, or Constrained Energy Lapped Transform (CELT). The combination of coding mode, audio bandwidth, and frame size is referred to as the configuration of an Opus packet.

每个Opus数据包开头的TOC序列表示编码模式、音频带宽、信道计数、持续时间(帧大小)和每个数据包的帧数,如[RFC6716]第3.1节所述。编码模式为丝绸、混合或约束能量重叠变换(CELT)之一。编码模式、音频带宽和帧大小的组合称为Opus分组的配置。

Packets are placed into Ogg pages in order until the end of stream. Audio data packets might span page boundaries. The first audio data page could have the 'continued packet' flag set (indicating the first audio data packet is continued from a previous page) if, for example, it was a live stream joined mid-broadcast, with the headers pasted on the front. If a page has the 'continued packet' flag set and one of the following conditions is also true:

数据包按顺序放入Ogg页面,直到流结束。音频数据包可能跨越页面边界。第一音频数据页可以设置“continued packet”标志(指示第一音频数据包是从上一页继续的),例如,如果它是在广播中加入的实时流,并且头贴在前面。如果页面设置了“continued packet”(继续数据包)标志,并且下列条件之一也为真:

o the previous page with packet data does not end in a continued packet (does not end with a lacing value of 255) OR

o 包含数据包数据的上一页不以连续数据包结束(不以255的lacing值结束)或

o the page sequence numbers are not consecutive,

o 页面序列号不是连续的,

then a demuxer MUST NOT attempt to decode the data for the first packet on the page unless the demuxer has some special knowledge that would allow it to interpret this data despite the missing pieces. An implementation MUST treat a zero-octet audio data packet as if it were a malformed Opus packet as described in Section 3.4 of [RFC6716].

然后,解复用器不得尝试解码页面上第一个数据包的数据,除非解复用器具有一些特殊知识,允许其解释该数据,尽管缺少数据块。如[RFC6716]第3.4节所述,实现必须将零八位组音频数据包视为格式错误的Opus数据包。

A logical stream ends with a page with the 'end of stream' flag set, but implementations need to be prepared to deal with truncated streams that do not have a page marked 'end of stream'. There is no reason for the final packet on the last page to be a continued packet, i.e., for the final lacing value to be 255. However, demuxers might encounter such streams, possibly as the result of a transfer that did not complete or of corruption. If a packet

逻辑流以设置了“end of stream”标志的页面结束,但实现需要准备好处理没有标记为“end of stream”页面的截断流。最后一页上的最终数据包没有理由是连续数据包,也就是说,最终花边值为255。但是,解复用器可能会遇到此类流,这可能是由于未完成的传输或损坏造成的。如果一个包

continues onto a subsequent page (i.e., when the page ends with a lacing value of 255) and one of the following conditions is also true:

继续到下一页(即,当页面结束时,花边值为255),并且下列条件之一也成立:

o the next page with packet data does not have the 'continued packet' flag set, OR

o 包含数据包数据的下一页未设置“continued packet”标志,或

o there is no next page with packet data, OR

o 没有包含数据包数据的下一页,或者

o the page sequence numbers are not consecutive,

o 页面序列号不是连续的,

then a demuxer MUST NOT attempt to decode the data from that packet unless the demuxer has some special knowledge that would allow it to interpret this data despite the missing pieces. There MUST NOT be any more pages in an Opus logical bitstream after a page marked 'end of stream'.

然后,解复用器不得尝试解码来自该数据包的数据,除非解复用器具有一些特殊知识,允许其解释该数据,尽管缺少数据块。在标记为“流结束”的页面之后,Opus逻辑位流中不得有更多页面。

4. Granule Position
4. 颗粒位置

The granule position MUST be zero for the ID header page and the page where the comment header completes. That is, the first page in the logical stream and the last header page before the first audio data page both have a granule position of zero.

ID标题页和注释标题完成的页面的颗粒位置必须为零。也就是说,逻辑流中的第一个页面和第一个音频数据页面之前的最后一个标题页面的颗粒位置都为零。

The granule position of an audio data page encodes the total number of PCM samples in the stream up to and including the last fully decodable sample from the last packet completed on that page. The granule position of the first audio data page will usually be larger than zero, as described in Section 4.5.

音频数据页的颗粒位置编码流中PCM样本的总数,包括该页上完成的最后一个数据包中的最后一个完全可解码样本。如第4.5节所述,第一个音频数据页的位置通常大于零。

A page that is entirely spanned by a single packet (that completes on a subsequent page) has no granule position, and the granule position field is set to the special value '-1' in two's complement.

完全由单个数据包跨越的页面(在后续页面上完成)没有颗粒位置,颗粒位置字段在两个数据包的补码中设置为特殊值“-1”。

The granule position of an audio data page is in units of PCM audio samples at a fixed rate of 48 kHz (per channel; a stereo stream's granule position does not increment at twice the speed of a mono stream). It is possible to run an Opus decoder at other sampling rates, but all Opus packets encode samples at a sampling rate that evenly divides 48 kHz. Therefore, the value in the granule position field always counts samples assuming a 48 kHz decoding rate, and the rest of this specification makes the same assumption.

音频数据页的颗粒位置以48 kHz固定速率的PCM音频样本为单位(每个通道;立体声流的颗粒位置不会以单声道流速度的两倍增加)。可以以其他采样率运行Opus解码器,但所有Opus数据包都以平均除以48 kHz的采样率对样本进行编码。因此,颗粒位置字段中的值总是在假设48 kHz解码率的情况下对样本进行计数,本规范的其余部分也进行了相同的假设。

The duration of an Opus packet as defined in [RFC6716] can be any multiple of 2.5 ms, up to a maximum of 120 ms. This duration is encoded in the TOC sequence at the beginning of each packet. The number of samples returned by a decoder corresponds to this duration exactly, even for the first few packets. For example, a 20 ms packet

[RFC6716]中定义的Opus数据包的持续时间可以是2.5ms的任意倍数,最多可达120ms。该持续时间在每个数据包的开头以TOC序列编码。解码器返回的样本数与此持续时间完全对应,即使对于前几个数据包也是如此。例如,一个20ms的数据包

fed to a decoder running at 48 kHz will always return 960 samples. A demuxer can parse the TOC sequence at the beginning of each Ogg packet to work backwards or forwards from a packet with a known granule position (i.e., the last packet completed on some page) in order to assign granule positions to every packet, or even every individual sample. The one exception is the last page in the stream, as described below.

馈送到以48 kHz运行的解码器将始终返回960个样本。解复用器可以解析每个Ogg数据包开始处的TOC序列,从具有已知颗粒位置的数据包(即,在某一页面上完成的最后一个数据包)向后或向前工作,以便将颗粒位置分配给每个数据包,甚至每个单独的样本。一个例外是流中的最后一页,如下所述。

All other pages with completed packets after the first MUST have a granule position equal to the number of samples contained in packets that complete on that page plus the granule position of the most recent page with completed packets. This guarantees that a demuxer can assign individual packets the same granule position when working forwards as when working backwards. For this to work, there cannot be any gaps.

在第一个页面之后包含完整数据包的所有其他页面的颗粒位置必须等于在该页面上完成的数据包中包含的样本数加上包含完整数据包的最近页面的颗粒位置。这保证了解复用器在向前工作和向后工作时,可以为单个数据包分配相同的颗粒位置。要使这项工作发挥作用,就不能有任何差距。

4.1. Repairing Gaps in Real-Time Streams
4.1. 修复实时流中的间隙

In order to support capturing a real-time stream that has lost or not transmitted packets, a multiplexer (muxer) SHOULD emit packets that explicitly request the use of Packet Loss Concealment (PLC) in place of the missing packets. Implementations that fail to do so still MUST NOT increment the granule position for a page by anything other than the number of samples contained in packets that actually complete on that page.

为了支持捕获丢失或未传输数据包的实时流,多路复用器(muxer)应发出明确请求使用数据包丢失隐藏(PLC)代替丢失数据包的数据包。未能做到这一点的实现仍然不能将页面的颗粒位置增加除包含在该页面上实际完成的数据包中的样本数之外的任何值。

Only gaps that are a multiple of 2.5 ms are repairable, as these are the only durations that can be created by packet loss or discontinuous transmission. Muxers need not handle other gap sizes. Creating the necessary packets involves synthesizing a TOC byte (defined in Section 3.1 of [RFC6716]) -- and whatever additional internal framing is needed -- to indicate the packet duration for each stream. The actual length of each missing Opus frame inside the packet is zero bytes, as defined in Section 3.2.1 of [RFC6716].

只有2.5 ms的倍数的间隙才是可修复的,因为这是丢包或不连续传输造成的唯一持续时间。复用器不需要处理其他间隙大小。创建必要的数据包涉及合成一个TOC字节(定义见[RFC6716]第3.1节)以及任何需要的额外内部帧,以指示每个流的数据包持续时间。如[RFC6716]第3.2.1节所定义,数据包内每个缺失Opus帧的实际长度为零字节。

Zero-byte frames MAY be packed into packets using any of codes 0, 1, 2, or 3. When successive frames have the same configuration, the higher code packings reduce overhead. Likewise, if the TOC configuration matches, the muxer MAY further combine the empty frames with previous or subsequent nonzero-length frames (using code 2 or variable bitrate (VBR) code 3).

零字节帧可以使用代码0、1、2或3中的任何一个打包到数据包中。当连续的帧具有相同的配置时,较高的代码封装可以减少开销。类似地,如果TOC配置匹配,则muxer可进一步将空帧与先前或后续非零长度帧组合(使用代码2或可变比特率(VBR)代码3)。

[RFC6716] does not impose any requirements on the PLC, but this section outlines choices that are expected to have a positive influence on most PLC implementations, including the reference implementation. Synthesized TOC sequences SHOULD maintain the same mode, audio bandwidth, channel count, and frame size as the previous packet (if any). This is the simplest and usually the most well-

[RFC6716]并未对PLC提出任何要求,但本节概述了对大多数PLC实施(包括参考实施)产生积极影响的选择。合成的TOC序列应保持与前一个数据包(如果有)相同的模式、音频带宽、通道计数和帧大小。这是最简单的,通常也是最好的-

tested case for the PLC to handle and it covers all losses that do not include a configuration switch, as defined in Section 4.5 of [RFC6716].

根据[RFC6716]第4.5节的规定,PLC处理的测试案例,包括不包括配置开关的所有损失。

When a previous packet is available, keeping the audio bandwidth and channel count the same allows the PLC to provide maximum continuity in the concealment data it generates. However, if the size of the gap is not a multiple of the most recent frame size, then the frame size will have to change for at least some frames. Such changes SHOULD be delayed as long as possible to simplify things for PLC implementations.

当前一个数据包可用时,保持音频带宽和通道计数相同,可使PLC在其生成的隐藏数据中提供最大的连续性。但是,如果间隙的大小不是最新帧大小的倍数,则帧大小必须至少为某些帧更改。此类变更应尽可能延迟,以简化PLC实施。

As an example, a 95 ms gap could be encoded as nineteen 5 ms frames in two bytes with a single constant bitrate (CBR) code 3 packet. If the previous frame size was 20 ms, using four 20 ms frames followed by three 5 ms frames requires 4 bytes (plus an extra byte of Ogg lacing overhead), but allows the PLC to use its well-tested steady state behavior for as long as possible. The total bitrate of the latter approach, including Ogg overhead, is about 0.4 kbps, so the impact on file size is minimal.

例如,一个95毫秒的间隔可以用一个恒定比特率(CBR)代码3包编码为两个字节中的19个5毫秒帧。如果之前的帧大小为20毫秒,则使用四个20毫秒帧和三个5毫秒帧需要4个字节(加上额外的Ogg花边开销字节),但允许PLC尽可能长时间地使用其经过良好测试的稳态行为。后一种方法的总比特率(包括Ogg开销)约为0.4kbps,因此对文件大小的影响最小。

Changing modes is discouraged, since this causes some decoder implementations to reset their PLC state. However, SILK and Hybrid mode frames cannot fill gaps that are not a multiple of 10 ms. If switching to CELT mode is needed to match the gap size, a muxer SHOULD do so at the end of the gap to allow the PLC to function for as long as possible.

不鼓励更改模式,因为这会导致一些解码器实现重置其PLC状态。但是,丝绸和混合模式机架无法填充不是10 ms倍数的间隙。如果需要切换到CELT模式以匹配间隙大小,则应在间隙末端使用muxer,以允许PLC尽可能长时间运行。

In the example above, if the previous frame was a 20 ms SILK mode frame, the better solution is to synthesize a packet describing four 20 ms SILK frames, followed by a packet with a single 10 ms SILK frame, and finally a packet with a 5 ms CELT frame, to fill the 95 ms gap. This also requires four bytes to describe the synthesized packet data (two bytes for a CBR code 3 and one byte each for two code 0 packets) but three bytes of Ogg lacing overhead are needed to mark the packet boundaries. At 0.6 kbps, this is still a minimal bitrate impact over a naive, low-quality solution.

在上面的示例中,如果前一帧是20 ms丝绸模式帧,则更好的解决方案是合成一个描述四个20 ms丝绸帧的包,然后是一个包含单个10 ms丝绸帧的包,最后是一个包含5 ms CELT帧的包,以填补95 ms的间隙。这还需要四个字节来描述合成的分组数据(两个字节用于CBR代码3,一个字节用于两个代码0分组),但是需要三个字节的Ogg连接开销来标记分组边界。在0.6kbps的速度下,这对一个幼稚、低质量的解决方案来说仍然是一个最小的比特率影响。

Since medium-band audio is an option only in the SILK mode, wideband frames SHOULD be generated if switching from that configuration to CELT mode, to ensure that any PLC implementation that does try to migrate state between the modes will be able to preserve all of the available audio bandwidth.

由于中频音频仅在SILK模式下可用,因此如果从该配置切换到CELT模式,则应生成宽带帧,以确保尝试在模式之间迁移状态的任何PLC实现将能够保留所有可用音频带宽。

4.2. Pre-skip
4.2. 预跳

There is some amount of latency introduced during the decoding process, to allow for overlap in the CELT mode, stereo mixing in the SILK mode, and resampling. The encoder might have introduced additional latency through its own resampling and analysis (though the exact amount is not specified). Therefore, the first few samples produced by the decoder do not correspond to real input audio, but are instead composed of padding inserted by the encoder to compensate for this latency. These samples need to be stored and decoded, as Opus is an asymptotically convergent predictive codec, meaning the decoded contents of each frame depend on the recent history of decoder inputs. However, a player will want to skip these samples after decoding them.

解码过程中会引入一定量的延迟,以允许在CELT模式下重叠、在SILK模式下进行立体声混音以及重新采样。编码器可能通过自身的重采样和分析引入了额外的延迟(尽管未指定确切的延迟量)。因此,解码器产生的前几个样本并不对应于真实的输入音频,而是由编码器插入的填充组成,以补偿该延迟。这些样本需要存储和解码,因为Opus是一个渐进收敛的预测编解码器,这意味着每个帧的解码内容取决于解码器输入的最近历史。但是,玩家在解码这些样本后会希望跳过这些样本。

A 'pre-skip' field in the ID header (see Section 5.1) signals the number of samples that SHOULD be skipped (decoded but discarded) at the beginning of the stream, though some specific applications might have a reason for looking at that data. This amount need not be a multiple of 2.5 ms, MAY be smaller than a single packet, or MAY span the contents of several packets. These samples are not valid audio.

ID头中的“pre skip”(预跳过)字段(见第5.1节)表示流开始时应跳过(解码但丢弃)的样本数,尽管某些特定应用程序可能有查看该数据的理由。该量不需要是2.5ms的倍数,可以小于单个分组,或者可以跨越多个分组的内容。这些示例不是有效的音频。

For example, if the first Opus frame uses the CELT mode, it will always produce 120 samples of windowed overlap-add data. However, the overlap data is initially all zeros (since there is no prior frame), meaning this cannot, in general, accurately represent the original audio. The SILK mode requires additional delay to account for its analysis and resampling latency. The encoder delays the original audio to avoid this problem.

例如,如果第一个Opus帧使用CELT模式,它将始终生成120个窗口重叠添加数据样本。但是,重叠数据最初全为零(因为没有前一帧),这意味着通常无法准确表示原始音频。SILK模式需要额外的延迟来考虑其分析和重采样延迟。编码器延迟原始音频以避免此问题。

The 'pre-skip' field MAY also be used to perform sample-accurate cropping of already encoded streams. In this case, a value of at least 3840 samples (80 ms) provides sufficient history to the decoder that it will have converged before the stream's output begins.

“预跳过”字段还可用于对已编码流执行样本精确裁剪。在这种情况下,至少3840个样本(80ms)的值向解码器提供足够的历史,它将在流的输出开始之前收敛。

4.3. PCM Sample Position
4.3. PCM采样位置

The PCM sample position is determined from the granule position using the following formula:

PCM样品位置由颗粒位置通过以下公式确定:

'PCM sample position' = 'granule position' - 'pre-skip'

“PCM样本位置”=“颗粒位置”-“预跳过”

For example, if the granule position of the first audio data page is 59,971, and the pre-skip is 11,971, then the PCM sample position of the last decoded sample from that page is 48,000.

例如,如果第一个音频数据页的颗粒位置为59971,预跳过为11971,则来自该页的最后一个解码样本的PCM样本位置为48000。

This can be converted into a playback time using the following formula:

可以使用以下公式将其转换为播放时间:

                                    'PCM sample position'
                  'playback time' = ---------------------
                                           48000.0
        
                                    'PCM sample position'
                  'playback time' = ---------------------
                                           48000.0
        

The initial PCM sample position before any samples are played is normally '0'. In this case, the PCM sample position of the first audio sample to be played starts at '1', because it marks the time on the clock _after_ that sample has been played, and a stream that is exactly one second long has a final PCM sample position of '48000', as in the example here.

播放任何样本前的初始PCM样本位置通常为“0”。在这种情况下,要播放的第一个音频样本的PCM样本位置从“1”开始,因为它在时钟上标记了该样本播放后的时间,并且正好一秒长的流的最终PCM样本位置为“48000”,如这里的示例所示。

Vorbis streams use a granule position smaller than the number of audio samples contained in the first audio data page to indicate that some of those samples are trimmed from the output (see [VORBIS-TRIM]). However, to do so, Vorbis requires that the first audio data page contains exactly two packets, in order to allow the decoder to perform PCM position adjustments before needing to return any PCM data. Opus uses the pre-skip mechanism for this purpose instead, since the encoder might introduce more than a single packet's worth of latency, and since very large packets in streams with a very large number of channels might not fit on a single page.

Vorbis流使用小于第一个音频数据页中包含的音频样本数的颗粒位置来指示从输出中修剪了其中一些样本(请参见[Vorbis-TRIM])。然而,要做到这一点,Vorbis要求第一个音频数据页正好包含两个数据包,以便允许解码器在需要返回任何PCM数据之前执行PCM位置调整。Opus为此目的使用了预跳过机制,因为编码器可能会引入超过单个数据包价值的延迟,并且具有大量通道的流中的非常大的数据包可能不适合单个页面。

4.4. End Trimming
4.4. 端部修整

The page with the 'end of stream' flag set MAY have a granule position that indicates the page contains less audio data than would normally be returned by decoding up through the final packet. This is used to end the stream somewhere other than an even frame boundary. The granule position of the most recent audio data page with completed packets is used to make this determination, or '0' is used if there were no previous audio data pages with a completed packet. The difference between these granule positions indicates how many samples to keep after decoding the packets that completed on the final page. The remaining samples are discarded. The number of discarded samples SHOULD be no larger than the number decoded from the last packet.

设置了“流结束”标志的页面可能具有一个颗粒位置,该位置指示该页面包含的音频数据少于通常通过向上解码最终数据包返回的音频数据。这用于在偶数帧边界以外的某个位置结束流。使用具有已完成数据包的最近音频数据页的位置来进行此确定,或者如果以前没有具有已完成数据包的音频数据页,则使用“0”。这些颗粒位置之间的差异表示解码最后一页上完成的数据包后要保留多少样本。剩余的样本将被丢弃。丢弃的样本数不应大于从最后一个数据包解码的样本数。

4.5. Restrictions on the Initial Granule Position
4.5. 对初始颗粒位置的限制

The granule position of the first audio data page with a completed packet MAY be larger than the number of samples contained in packets that complete on that page. However, it MUST NOT be smaller, unless that page has the 'end of stream' flag set. Allowing a granule position larger than the number of samples allows the beginning of a stream to be cropped or a live stream to be joined without rewriting

具有完成的分组的第一音频数据页的位置可以大于包含在该页上完成的分组中的样本数。但是,它不能更小,除非该页面设置了“流结束”标志。允许颗粒位置大于样本数量,允许剪切流的开始部分或连接活流而无需重写

the granule position of all the remaining pages. This means that the PCM sample position just before the first sample to be played MAY be larger than '0'. Synchronization when multiplexing with other logical streams still uses the PCM sample position relative to '0' to compute sample times. This does not affect the behavior of pre-skip: exactly 'pre-skip' samples SHOULD be skipped from the beginning of the decoded output, even if the initial PCM sample position is greater than zero.

所有剩余页面的位置。这意味着即将播放的第一个样本前的PCM样本位置可能大于“0”。与其他逻辑流多路复用时的同步仍然使用相对于“0”的PCM采样位置来计算采样时间。这不会影响预跳过的行为:即使初始PCM样本位置大于零,也应从解码输出的开始处跳过“预跳过”样本。

On the other hand, a granule position that is smaller than the number of decoded samples prevents a demuxer from working backwards to assign each packet or each individual sample a valid granule position, since granule positions are non-negative. An implementation MUST treat any stream as invalid if the granule position is smaller than the number of samples contained in packets that complete on the first audio data page with a completed packet, unless that page has the 'end of stream' flag set. It MAY defer this action until it decodes the last packet completed on that page.

另一方面,由于颗粒位置是非负的,因此小于解码样本数量的颗粒位置防止解复用器向后工作以将每个分组或每个单独样本分配给有效的颗粒位置。如果颗粒位置小于在第一个音频数据页上使用已完成数据包完成的数据包中包含的样本数,则实现必须将任何流视为无效,除非该页面设置了“流结束”标志。它可能会延迟此操作,直到解码该页上完成的最后一个数据包。

If that page has the 'end of stream' flag set, a demuxer MUST treat any stream as invalid if its granule position is smaller than the 'pre-skip' amount. This would indicate that there are more samples to be skipped from the initial decoded output than exist in the stream. If the granule position is smaller than the number of decoded samples produced by the packets that complete on that page, then a demuxer MUST use an initial granule position of '0', and can work forwards from '0' to timestamp individual packets. If the granule position is larger than the number of decoded samples available, then the demuxer MUST still work backwards as described above, even if the 'end of stream' flag is set, to determine the initial granule position, and thus the initial PCM sample position. Both of these will be greater than '0' in this case.

如果该页面设置了“流结束”标志,则如果任何流的颗粒位置小于“预跳过”量,则解复用器必须将其视为无效流。这将表明从初始解码输出中跳过的样本比流中存在的样本多。如果颗粒位置小于在该页上完成的数据包生成的解码样本数,则解复用器必须使用初始颗粒位置“0”,并且可以从“0”向前工作,为各个数据包加上时间戳。如果颗粒位置大于可用解码样本数,则解复用器仍必须如上所述向后工作,即使设置了“流结束”标志,以确定初始颗粒位置,从而确定初始PCM样本位置。在这种情况下,这两个值都将大于“0”。

4.6. Seeking and Pre-roll
4.6. 寻找和预滚

Seeking in Ogg files is best performed using a bisection search for a page whose granule position corresponds to a PCM position at or before the seek target. With appropriately weighted bisection, accurate seeking can be performed in just one or two bisections on average, even in multi-gigabyte files. See [SEEKING] for an example of general implementation guidance.

Ogg文件中的搜索最好使用对分搜索来执行,该页面的颗粒位置对应于搜索目标处或之前的PCM位置。使用适当的加权二分法,平均只需一到两个二分法即可执行精确搜索,即使是在多GB的文件中也是如此。有关一般实施指南的示例,请参见[寻求]。

When seeking within an Ogg Opus stream, an implementation SHOULD start decoding (and discarding the output) at least 3840 samples (80 ms) prior to the seek target in order to ensure that the output audio is correct by the time it reaches the seek target. This "pre-roll" is separate from, and unrelated to, the pre-skip used at the beginning of the stream. If the point 80 ms prior to the seek

当在Ogg Opus流中搜索时,实现应在搜索目标之前至少开始解码(并丢弃输出)3840个样本(80ms),以确保输出音频在到达搜索目标时是正确的。此“预滚动”与流开始时使用的预跳过分离,且与之无关。如果该点在寻道之前80毫秒

target comes before the initial PCM sample position, an implementation SHOULD start decoding from the beginning of the stream, applying pre-skip as normal, regardless of whether the pre-skip is larger or smaller than 80 ms, and then continue to discard samples to reach the seek target (if any).

目标在初始PCM样本位置之前,实现应从流的开头开始解码,正常应用预跳过,无论预跳过大于或小于80毫秒,然后继续丢弃样本以达到寻道目标(如果有)。

5. Header Packets
5. 头数据包

An Ogg Opus logical stream contains exactly two mandatory header packets: an identification header and a comment header.

Ogg Opus逻辑流正好包含两个必需的头数据包:标识头和注释头。

5.1. Identification Header
5.1. 识别头
      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |      'O'      |      'p'      |      'u'      |      's'      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |      'H'      |      'e'      |      'a'      |      'd'      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |  Version = 1  | Channel Count |           Pre-skip            |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                     Input Sample Rate (Hz)                    |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |   Output Gain (Q7.8 in dB)    | Mapping Family|               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               :
     |                                                               |
     :               Optional Channel Mapping Table...               :
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |      'O'      |      'p'      |      'u'      |      's'      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |      'H'      |      'e'      |      'a'      |      'd'      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |  Version = 1  | Channel Count |           Pre-skip            |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                     Input Sample Rate (Hz)                    |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |   Output Gain (Q7.8 in dB)    | Mapping Family|               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               :
     |                                                               |
     :               Optional Channel Mapping Table...               :
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Figure 2: ID Header Packet

图2:ID头数据包

The fields in the identification (ID) header have the following meaning:

标识(ID)标题中的字段具有以下含义:

1. Magic Signature:

1. 魔术签名:

This is an 8-octet (64-bit) field that allows codec identification and is human readable. It contains, in order, the magic numbers:

这是一个8八位字节(64位)字段,允许编解码器识别,并且是人类可读的。它按顺序包含魔术数字:

0x4F 'O'

0x4F'O'

0x70 'p'

0x70“p”

0x75 'u'

0x75“u”

0x73 's'

0x73's'

0x48 'H'

0x48“H”

0x65 'e'

0x65“e”

0x61 'a'

0x61‘a’

0x64 'd'

0x64'd'

Starting with "Op" helps distinguish it from audio data packets, as this is an invalid TOC sequence.

以“Op”开头有助于将其与音频数据包区分开来,因为这是一个无效的TOC序列。

2. Version (8 bits, unsigned):

2. 版本(8位,无符号):

The version number MUST always be '1' for this version of the encapsulation specification. Implementations SHOULD treat streams where the upper four bits of the version number match that of a recognized specification as backwards compatible with that specification. That is, the version number can be split into "major" and "minor" version sub-fields, with changes to the minor sub-field (in the lower four bits) signaling compatible changes. For example, an implementation of this specification SHOULD accept any stream with a version number of '15' or less, and SHOULD assume any stream with a version number '16' or greater is incompatible. The initial version '1' was chosen to keep implementations from relying on this octet as a null terminator for the "OpusHead" string.

对于此版本的封装规范,版本号必须始终为“1”。实现应将版本号的前四位与已识别规范的前四位匹配的流视为向后兼容该规范。也就是说,版本号可以分为“主要”和“次要”版本子字段,次要子字段的更改(在较低的四位中)表示兼容的更改。例如,此规范的实现应接受版本号为“15”或更低的任何流,并应假定版本号为“16”或更高的任何流不兼容。选择初始版本“1”是为了防止实现依赖此八位字节作为“OpusHead”字符串的空终止符。

3. Output Channel Count 'C' (8 bits, unsigned):

3. 输出通道计数“C”(8位,无符号):

This is the number of output channels. This might be different than the number of encoded channels, which can change on a packet-by-packet basis. This value MUST NOT be zero. The maximum allowable value depends on the channel mapping family, and might be as large as 255. See Section 5.1.1 for details.

这是输出通道的数量。这可能不同于编码信道的数量,编码信道的数量可以在分组的基础上改变。此值不能为零。允许的最大值取决于通道贴图族,可能高达255。详见第5.1.1节。

4. Pre-skip (16 bits, unsigned, little endian):

4. 预跳过(16位,无符号,小端):

This is the number of samples (at 48 kHz) to discard from the decoder output when starting playback, and also the number to subtract from a page's granule position to calculate its PCM sample position. When cropping the beginning of existing Ogg Opus streams, a pre-skip of at least 3,840 samples (80 ms) is RECOMMENDED to ensure complete convergence in the decoder.

这是开始播放时从解码器输出中丢弃的样本数(48 kHz),也是从页面的颗粒位置减去以计算其PCM样本位置的数目。在裁剪现有Ogg Opus流的开始时,建议至少跳过3840个样本(80毫秒),以确保解码器中的完全收敛。

5. Input Sample Rate (32 bits, unsigned, little endian):

5. 输入采样率(32位,无符号,小端):

This is the sample rate of the original input (before encoding), in Hz. This field is _not_ the sample rate to use for playback of the encoded data.

这是原始输入(编码前)的采样率,单位为Hz。此字段不是用于播放编码数据的采样率。

Opus can switch between internal audio bandwidths of 4, 6, 8, 12, and 20 kHz. Each packet in the stream can have a different audio bandwidth. Regardless of the audio bandwidth, the reference decoder supports decoding any stream at a sample rate of 8, 12, 16, 24, or 48 kHz. The original sample rate of the audio passed to the encoder is not preserved by the lossy compression.

Opus可以在4、6、8、12和20 kHz的内部音频带宽之间切换。流中的每个数据包可以具有不同的音频带宽。无论音频带宽如何,参考解码器都支持以8、12、16、24或48 kHz的采样率对任何流进行解码。传递给编码器的音频的原始采样率不会被有损压缩保留。

An Ogg Opus player SHOULD select the playback sample rate according to the following procedure:

Ogg Opus播放器应根据以下程序选择播放采样率:

1. If the hardware supports 48 kHz playback, decode at 48 kHz.

1. 如果硬件支持48 kHz播放,则以48 kHz解码。

2. Otherwise, if the hardware's highest available sample rate is a supported rate, decode at this sample rate.

2. 否则,如果硬件的最高可用采样率是支持的速率,则以该采样率解码。

3. Otherwise, if the hardware's highest available sample rate is less than 48 kHz, decode at the next higher Opus supported rate above the highest available hardware rate and resample.

3. 否则,如果硬件的最高可用采样率小于48 kHz,则以高于最高可用硬件速率的下一个更高Opus支持速率解码并重新采样。

4. Otherwise, decode at 48 kHz and resample.

4. 否则,以48 kHz解码并重新采样。

However, the 'input sample rate' field allows the muxer to pass the sample rate of the original input stream as metadata. This is useful when the user requires the output sample rate to match the input sample rate. For example, when not playing the output, an implementation writing PCM format samples to disk might choose to resample the audio back to the original input sample rate to reduce surprise to the user, who might reasonably expect to get back a file with the same sample rate.

但是,“输入采样率”字段允许muxer将原始输入流的采样率作为元数据传递。当用户要求输出采样率与输入采样率匹配时,这非常有用。例如,当不播放输出时,将PCM格式样本写入磁盘的实现可能会选择将音频重新采样回原始输入采样率,以减少用户的惊讶,用户可能有理由期望以相同的采样率返回文件。

A value of zero indicates "unspecified". Muxers SHOULD write the actual input sample rate or zero, but implementations that do something with this field SHOULD take care to behave sanely if given crazy values (e.g., do not actually upsample the output to 10 MHz if requested). Implementations SHOULD support input sample rates between 8 kHz and 192 kHz (inclusive). Rates outside this range MAY be ignored by falling back to the default rate of 48 kHz instead.

值为零表示“未指定”。多路复用器应写入实际输入采样率或零,但使用此字段执行操作的实现应注意在给定疯狂值时表现正常(例如,如果请求,不实际将输出采样到10 MHz)。实现应支持8 kHz至192 kHz(含)之间的输入采样率。此范围之外的速率可以通过返回默认速率48 kHz来忽略。

6. Output Gain (16 bits, signed, little endian):

6. 输出增益(16位,有符号,小端):

This is a gain to be applied when decoding. It is 20*log10 of the factor by which to scale the decoder output to achieve the desired playback volume, stored in a 16-bit, signed, two's complement fixed-point value with 8 fractional bits (i.e., Q7.8 [Q-NOTATION]).

这是解码时应用的增益。它是20*log10的因子,通过该因子缩放解码器输出以实现所需的回放音量,存储在16位、有符号、2的补码定点值和8个分数位(即,Q7.8[Q-符号])中。

To apply the gain, an implementation could use the following:

要应用增益,实现可以使用以下内容:

                 sample *= pow(10, output_gain/(20.0*256))
        
                 sample *= pow(10, output_gain/(20.0*256))
        

where 'output_gain' is the raw 16-bit value from the header.

其中“输出增益”是来自报头的原始16位值。

Players and media frameworks SHOULD apply it by default. If a player chooses to apply any volume adjustment or gain modification, such as the R128_TRACK_GAIN (see Section 5.2), the adjustment MUST be applied in addition to this output gain in order to achieve playback at the normalized volume.

默认情况下,播放器和媒体框架应该应用它。如果播放机选择应用任何音量调整或增益修改,如R128_曲目_增益(参见第5.2节),则除了此输出增益外,还必须应用该调整,以便在标准化音量下实现播放。

A muxer SHOULD set this field to zero, and instead apply any gain prior to encoding, when this is possible and does not conflict with the user's wishes. A nonzero output gain indicates the gain was adjusted after encoding, or that a user wished to adjust the gain for playback while preserving the ability to recover the original signal amplitude.

muxer应将该字段设置为零,并在编码前应用任何增益,前提是这是可能的,并且不会与用户的意愿相冲突。非零输出增益表示编码后调整了增益,或者用户希望调整增益以便回放,同时保持恢复原始信号振幅的能力。

Although the output gain has enormous range (+/- 128 dB, enough to amplify inaudible sounds to the threshold of physical pain), most applications can only reasonably use a small portion of this range around zero. The large range serves in part to ensure that gain can always be losslessly transferred between OpusHead and R128 gain tags (see below) without saturating.

虽然输出增益的范围非常大(+/-128 dB,足以将听不见的声音放大到身体疼痛的阈值),但大多数应用程序只能合理地使用该范围的一小部分(约为零)。大范围的部分作用是确保增益始终可以在OpusHead和R128增益标签(见下文)之间无损传输,而不会饱和。

7. Channel Mapping Family (8 bits, unsigned):

7. 通道映射系列(8位,无符号):

This octet indicates the order and semantic meaning of the output channels.

此八位组表示输出通道的顺序和语义。

Each currently specified value of this octet indicates a mapping family, which defines a set of allowed channel counts, and the ordered set of channel names for each allowed channel count. The details are described in Section 5.1.1.

此八位字节的每个当前指定值表示一个映射族,该映射族定义了一组允许的通道计数,以及每个允许的通道计数的有序通道名称集。详情见第5.1.1节。

8. Channel Mapping Table:

8. 通道映射表:

This table defines the mapping from encoded streams to output channels. Its contents are specified in Section 5.1.1.

此表定义了从编码流到输出通道的映射。其内容见第5.1.1节。

All fields in the ID headers are REQUIRED, except for 'channel mapping table', which MUST be omitted when the channel mapping family is 0, but is REQUIRED otherwise. Implementations SHOULD treat a stream as invalid if it contains an ID header that does not have enough data for these fields, even if it contain a valid 'magic signature'. Future versions of this specification, even backwards-compatible versions, might include additional fields in the ID header. If an ID header has a compatible major version, but a larger minor version, an implementation MUST NOT treat it as invalid for containing additional data not specified here, provided it still completes on the first page.

ID标头中的所有字段均为必填字段,但“通道映射表”除外,当通道映射族为0时,必须忽略该字段,否则为必填字段。如果流包含的ID头没有足够的数据用于这些字段,即使它包含有效的“魔术签名”,实现也应将其视为无效。本规范的未来版本,甚至向后兼容的版本,可能会在ID头中包含其他字段。如果ID头具有兼容的主版本,但具有较大的次版本,则实现不得将其视为包含此处未指定的其他数据的无效,前提是它仍然在第一页完成。

5.1.1. Channel Mapping
5.1.1. 通道映射

An Ogg Opus stream allows mapping one number of Opus streams (N) to a possibly larger number of decoded channels (M + N) to yet another number of output channels (C), which might be larger or smaller than the number of decoded channels. The order and meaning of these channels are defined by a channel mapping, which consists of the 'channel mapping family' octet and, for channel mapping families other than family 0, a 'channel mapping table', as illustrated in Figure 3.

Ogg Opus流允许将一个数量的Opus流(N)映射到可能更大数量的解码信道(M+N)到另一个数量的输出信道(C),其可能大于或小于解码信道的数量。这些通道的顺序和意义由通道映射定义,该映射由“通道映射族”八位字节组成,对于非族0的通道映射族,则由“通道映射表”组成,如图3所示。

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
                                                     +-+-+-+-+-+-+-+-+
                                                     | Stream Count  |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | Coupled Count |              Channel Mapping...               :
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
                                                     +-+-+-+-+-+-+-+-+
                                                     | Stream Count  |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | Coupled Count |              Channel Mapping...               :
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Figure 3: Channel Mapping Table

图3:通道映射表

The fields in the channel mapping table have the following meaning:

通道映射表中的字段具有以下含义:

1. Stream Count 'N' (8 bits, unsigned):

1. 流计数“N”(8位,无符号):

This is the total number of streams encoded in each Ogg packet. This value is necessary to correctly parse the packed Opus packets inside an Ogg packet, as described in Section 3. This value MUST NOT be zero, as without at least one Opus packet with a valid TOC sequence, a demuxer cannot recover the duration of an Ogg packet.

这是每个Ogg数据包中编码的流的总数。如第3节所述,该值对于正确解析Ogg数据包内的打包Opus数据包是必需的。此值不得为零,因为如果没有至少一个具有有效TOC序列的Opus数据包,解复用器将无法恢复Ogg数据包的持续时间。

For channel mapping family 0, this value defaults to 1, and is not coded.

对于通道映射族0,此值默认为1,并且未进行编码。

2. Coupled Stream Count 'M' (8 bits, unsigned):

2. 耦合流计数“M”(8位,无符号):

This is the number of streams whose decoders are to be configured to produce two channels (stereo). This MUST be no larger than the total number of streams, N.

这是将其解码器配置为产生两个通道(立体声)的流的数量。这不得大于流的总数N。

Each packet in an Opus stream has an internal channel count of 1 or 2, which can change from packet to packet. This is selected by the encoder depending on the bitrate and the audio being encoded. The original channel count of the audio passed to the encoder is not necessarily preserved by the lossy compression.

Opus流中的每个数据包的内部通道计数为1或2,可以根据数据包的不同而变化。这由编码器根据比特率和正在编码的音频选择。传递给编码器的音频的原始通道计数不一定由有损压缩保留。

Regardless of the internal channel count, any Opus stream can be decoded as mono (a single channel) or stereo (two channels) by appropriate initialization of the decoder. The 'coupled stream count' field indicates that the decoders for the first M Opus streams are to be initialized for stereo (two-channel) output, and the remaining (N - M) decoders are to be initialized for mono (a single channel) only. The total number of decoded channels, (M + N), MUST be no larger than 255, as there is no way to index more channels than that in the channel mapping.

无论内部通道数如何,通过适当初始化解码器,任何Opus流都可以解码为单声道(单声道)或立体声(双声道)。“耦合流计数”字段表示第一个M个Opus流的解码器将被初始化为立体声(双通道)输出,其余的(N-M)解码器将仅被初始化为单声道(单通道)。解码信道的总数(M+N)不得大于255,因为无法索引比信道映射中更多的信道。

For channel mapping family 0, this value defaults to (C - 1) (i.e., 0 for mono and 1 for stereo), and is not coded.

对于通道映射族0,该值默认为(C-1)(即,0表示单声道,1表示立体声),并且未进行编码。

3. Channel Mapping (8*C bits):

3. 通道映射(8*C位):

This contains one octet per output channel, indicating which decoded channel is to be used for each one. Let 'index' be the value of this octet for a particular output channel. This value MUST either be smaller than (M + N) or be the special value 255. If 'index' is less than 2*M, the output MUST be taken from decoding stream ('index'/2) as stereo and selecting the left channel if 'index' is even, and the right channel if 'index' is odd. If 'index' is 2*M or larger, but less than 255, the output MUST be taken from decoding stream ('index' - M) as mono. If 'index' is 255, the corresponding output channel MUST contain pure silence.

每个输出通道包含一个八位组,指示每个通道将使用哪个解码通道。让'index'为特定输出通道的该八位字节的值。此值必须小于(M+N)或为特殊值255。如果'index'小于2*M,则输出必须来自解码流('index'/2)作为立体声,如果'index'为偶数,则选择左声道,如果'index'为奇数,则选择右声道。如果'index'为2*M或更大,但小于255,则必须将输出从解码流('index'-M)中提取为mono。如果“索引”为255,则相应的输出通道必须包含纯静音。

The number of output channels, C, is not constrained to match the number of decoded channels (M + N). A single index value MAY appear multiple times, i.e., the same decoded channel might be mapped to multiple output channels. Some decoded channels might not be assigned to any output channel, as well.

输出通道的数量C不受限制以匹配解码通道的数量(M+N)。单个索引值可能出现多次,即,同一解码通道可能映射到多个输出通道。某些解码通道也可能未分配给任何输出通道。

For channel mapping family 0, the first index defaults to 0, and if C == 2, the second index defaults to 1. Neither index is coded.

对于通道映射族0,第一个索引默认为0,如果C==2,第二个索引默认为1。两个索引都没有编码。

After producing the output channels, the channel mapping family determines the semantic meaning of each one. There are three defined mapping families in this specification.

生成输出通道后,通道映射族确定每个通道的语义。本规范中定义了三个映射族。

5.1.1.1. Channel Mapping Family 0
5.1.1.1. 通道映射族0

Allowed numbers of channels: 1 or 2. RTP mapping. This is the same channel interpretation as [RFC7587].

允许的通道数:1或2。RTP映射。这与[RFC7587]的通道解释相同。

o 1 channel: monophonic (mono).

o 1声道:单声道(单声道)。

o 2 channels: stereo (left, right).

o 2声道:立体声(左、右)。

Special mapping: This channel mapping family also indicates that the content consists of a single Opus stream that is stereo if and only if C == 2, with stream index 0 mapped to output channel 0 (mono, or left channel) and stream index 1 mapped to output channel 1 (right channel) if stereo. When the 'channel mapping family' octet has this value, the channel mapping table MUST be omitted from the ID header packet.

特殊映射:此频道映射系列还指示内容由单个Opus流组成,当且仅当C==2时为立体声,流索引0映射到输出频道0(单声道或左声道),流索引1映射到输出频道1(右声道)(如果为立体声)。当“channel mapping family”八位字节具有此值时,必须从ID头数据包中省略通道映射表。

5.1.1.2. Channel Mapping Family 1
5.1.1.2. 通道映射族1

Allowed numbers of channels: 1...8. Vorbis channel order (see below).

允许的频道数:1…8。Vorbis通道顺序(见下文)。

Each channel is assigned to a speaker location in a conventional surround arrangement. Specific locations depend on the number of channels, and are given below in order of the corresponding channel indices.

在传统环绕布置中,每个声道被分配到扬声器位置。具体位置取决于通道数量,下面按相应通道索引的顺序给出。

o 1 channel: monophonic (mono).

o 1声道:单声道(单声道)。

o 2 channels: stereo (left, right).

o 2声道:立体声(左、右)。

o 3 channels: linear surround (left, center, right).

o 3个通道:线性环绕(左、中、右)。

o 4 channels: quadraphonic (front left, front right, rear left, rear right).

o 4声道:四声道(左前、右前、左后、右后)。

o 5 channels: 5.0 surround (front left, front center, front right, rear left, rear right).

o 5声道:5.0环绕声(左前、中前、右前、左后、右后)。

o 6 channels: 5.1 surround (front left, front center, front right, rear left, rear right, LFE).

o 6声道:5.1环绕声(左前、前中、右前、左后、右后、LFE)。

o 7 channels: 6.1 surround (front left, front center, front right, side left, side right, rear center, LFE).

o 7声道:6.1环绕声(左前、前中、右前、左侧、右侧、后中、LFE)。

o 8 channels: 7.1 surround (front left, front center, front right, side left, side right, rear left, rear right, LFE).

o 8声道:7.1环绕声(左前、前中、右前、左侧、右侧、左后、右后、LFE)。

This set of surround options and speaker location orderings is the same as those used by the Vorbis codec [VORBIS-MAPPING]. The ordering is different from the one used by the WAVE [WAVE-MULTICHANNEL] and Free Lossless Audio Codec (FLAC) [FLAC] formats, so correct ordering requires permutation of the output channels when decoding to or encoding from those formats. "LFE" here refers to a Low Frequency Effects channel, often mapped to a subwoofer with no particular spatial position. Implementations SHOULD identify "side" or "rear" speaker locations with "surround" and "back" as appropriate when interfacing with audio formats or systems that prefer that terminology.

这组环绕声选项和扬声器位置顺序与Vorbis编解码器[Vorbis-MAPPING]使用的相同。顺序不同于WAVE[WAVE-MULTICHANNEL]和Free Lossless Audio Codec(FLAC)[FLAC]格式所使用的顺序,因此正确的顺序要求在对这些格式进行解码或编码时对输出通道进行排列。“LFE”在这里指的是低频效果通道,通常映射到没有特定空间位置的超低音扬声器。当与更喜欢该术语的音频格式或系统接口时,实现应根据需要使用“环绕”和“背面”标识“侧面”或“后部”扬声器位置。

5.1.1.3. Channel Mapping Family 255
5.1.1.3. 通道映射族255

Allowed numbers of channels: 1...255. No defined channel meaning.

允许的通道数:1…255。没有定义的通道含义。

Channels are unidentified. General-purpose players SHOULD NOT attempt to play these streams. Offline implementations MAY deinterleave the output into separate PCM files, one per channel. Implementations SHOULD NOT produce output for channels mapped to stream index 255 (pure silence) unless they have no other way to indicate the index of non-silent channels.

频道不明。通用玩家不应尝试播放这些流。脱机实施可能会将输出解交错到单独的PCM文件中,每个通道一个。实现不应该为映射到流索引255(纯静默)的通道生成输出,除非它们没有其他方法来指示非静默通道的索引。

5.1.1.4. Undefined Channel Mappings
5.1.1.4. 未定义的通道映射

The remaining channel mapping families (2...254) are reserved. A demuxer implementation encountering a reserved 'channel mapping family' value SHOULD act as though the value is 255.

保留其余的通道映射族(2…254)。遇到保留的“通道映射族”值的解复用器实现应视为该值为255。

5.1.1.5. Downmixing
5.1.1.5. 下混

An Ogg Opus player MUST support any valid channel mapping with a channel mapping family of 0 or 1, even if the number of channels does not match the physically connected audio hardware. Players SHOULD perform channel mixing to increase or reduce the number of channels as needed.

Ogg Opus播放器必须支持通道映射族为0或1的任何有效通道映射,即使通道数与物理连接的音频硬件不匹配。玩家应根据需要进行频道混合以增加或减少频道数量。

Implementations MAY use the matrices in Figures 4 through 9 to implement downmixing from multichannel files using channel mapping family 1 (Section 5.1.1.2), which are known to give acceptable results for stereo. Matrices for 3 and 4 channels are normalized so each coefficient row sums to 1 to avoid clipping. For 5 or more channels, they are normalized to 2 as a compromise between clipping and dynamic range reduction.

实现可使用图4至图9中的矩阵,使用通道映射系列1(第5.1.1.2节)从多通道文件实现下混频,已知该系列可为立体声提供可接受的结果。3和4个通道的矩阵被归一化,因此每个系数行和为1,以避免剪裁。对于5个或更多通道,它们被标准化为2,作为剪裁和动态范围缩小之间的折衷。

In these matrices the front-left and front-right channels are generally passed through directly. When a surround channel is split between both the left and right stereo channels, coefficients are chosen so their squares sum to 1, which helps preserve the perceived intensity. Rear channels are mixed more diffusely or attenuated to maintain focus on the front channels.

在这些矩阵中,左前和右前通道通常直接通过。当环绕声道在左右立体声声道之间分割时,会选择系数,使其平方和为1,这有助于保持感知强度。后声道的混合更加分散或衰减,以保持对前声道的聚焦。

   L output = ( 0.585786 * left + 0.414214 * center                    )
   R output = (                   0.414214 * center + 0.585786 * right )
        
   L output = ( 0.585786 * left + 0.414214 * center                    )
   R output = (                   0.414214 * center + 0.585786 * right )
        

Exact coefficient values are 1 and 1/sqrt(2), multiplied by 1/(1 + 1/sqrt(2)) for normalization.

精确的系数值为1和1/sqrt(2),乘以1/(1+1/sqrt(2))进行归一化。

Figure 4: Stereo Downmix Matrix for the Linear Surround Channel Mapping

图4:线性环绕声道映射的立体声下混音矩阵

       /          \   /                                     \ / FL \
       | L output |   | 0.422650 0.000000 0.366025 0.211325 | | FR |
       | R output | = | 0.000000 0.422650 0.211325 0.366025 | | RL |
       \          /   \                                     / \ RR /
        
       /          \   /                                     \ / FL \
       | L output |   | 0.422650 0.000000 0.366025 0.211325 | | FR |
       | R output | = | 0.000000 0.422650 0.211325 0.366025 | | RL |
       \          /   \                                     / \ RR /
        

Exact coefficient values are 1, sqrt(3)/2 and 1/2, multiplied by 1/(1 + sqrt(3)/2 + 1/2) for normalization.

精确的系数值为1,sqrt(3)/2和1/2,乘以1/(1+sqrt(3)/2+1/2)进行归一化。

Figure 5: Stereo Downmix Matrix for the Quadraphonic Channel Mapping

图5:四声道映射的立体声下混频矩阵

                                                               / FL \
      /   \   /                                              \ | FC |
      | L |   | 0.650802 0.460186 0.000000 0.563611 0.325401 | | FR |
      | R | = | 0.000000 0.460186 0.650802 0.325401 0.563611 | | RL |
      \   /   \                                              / | RR |
                                                               \    /
        
                                                               / FL \
      /   \   /                                              \ | FC |
      | L |   | 0.650802 0.460186 0.000000 0.563611 0.325401 | | FR |
      | R | = | 0.000000 0.460186 0.650802 0.325401 0.563611 | | RL |
      \   /   \                                              / | RR |
                                                               \    /
        

Exact coefficient values are 1, 1/sqrt(2), sqrt(3)/2 and 1/2, multiplied by 2/(1 + 1/sqrt(2) + sqrt(3)/2 + 1/2) for normalization.

精确的系数值为1、1/sqrt(2)、sqrt(3)/2和1/2,乘以2/(1+1/sqrt(2)+sqrt(3)/2+1/2进行归一化。

Figure 6: Stereo Downmix Matrix for the 5.0 Surround Mapping

图6:5.0环绕贴图的立体声下混音矩阵

                                                                   /FL \
   / \   /                                                       \ |FC |
   |L|   | 0.529067 0.374107 0.000000 0.458186 0.264534 0.374107 | |FR |
   |R| = | 0.000000 0.374107 0.529067 0.264534 0.458186 0.374107 | |RL |
   \ /   \                                                       / |RR |
                                                                   \LFE/
   Exact coefficient values are 1, 1/sqrt(2), sqrt(3)/2 and 1/2,
   multiplied by 2/(1 + 1/sqrt(2) + sqrt(3)/2 + 1/2 + 1/sqrt(2)) for
   normalization.
        
                                                                   /FL \
   / \   /                                                       \ |FC |
   |L|   | 0.529067 0.374107 0.000000 0.458186 0.264534 0.374107 | |FR |
   |R| = | 0.000000 0.374107 0.529067 0.264534 0.458186 0.374107 | |RL |
   \ /   \                                                       / |RR |
                                                                   \LFE/
   Exact coefficient values are 1, 1/sqrt(2), sqrt(3)/2 and 1/2,
   multiplied by 2/(1 + 1/sqrt(2) + sqrt(3)/2 + 1/2 + 1/sqrt(2)) for
   normalization.
        

Figure 7: Stereo Downmix Matrix for the 5.1 Surround Mapping

图7:5.1环绕贴图的立体声下混音矩阵

     /                                                                \
     | 0.455310 0.321953 0.000000 0.394310 0.227655 0.278819 0.321953 |
     | 0.000000 0.321953 0.455310 0.227655 0.394310 0.278819 0.321953 |
     \                                                                /
        
     /                                                                \
     | 0.455310 0.321953 0.000000 0.394310 0.227655 0.278819 0.321953 |
     | 0.000000 0.321953 0.455310 0.227655 0.394310 0.278819 0.321953 |
     \                                                                /
        

Exact coefficient values are 1, 1/sqrt(2), sqrt(3)/2, 1/2 and sqrt(3)/2/sqrt(2), multiplied by 2/(1 + 1/sqrt(2) + sqrt(3)/2 + 1/2 + sqrt(3)/2/sqrt(2) + 1/sqrt(2)) for normalization. The coefficients are in the same order as in Section 5.1.1.2 and the matrices above.

精确的系数值为1,1/sqrt(2),sqrt(3)/2,1/2和sqrt(3)/2/sqrt(2),乘以2/(1+1/sqrt(2)+sqrt(3)/2+1/2+sqrt(3)/2/sqrt(2)+1/sqrt(2))进行归一化。系数的顺序与第5.1.1.2节和上述矩阵中的顺序相同。

Figure 8: Stereo Downmix Matrix for the 6.1 Surround Mapping

图8:6.1环绕贴图的立体声下混音矩阵

    /                                                                 \
    | .388631 .274804 .000000 .336565 .194316 .336565 .194316 .274804 |
    | .000000 .274804 .388631 .194316 .336565 .194316 .336565 .274804 |
    \                                                                 /
        
    /                                                                 \
    | .388631 .274804 .000000 .336565 .194316 .336565 .194316 .274804 |
    | .000000 .274804 .388631 .194316 .336565 .194316 .336565 .274804 |
    \                                                                 /
        

Exact coefficient values are 1, 1/sqrt(2), sqrt(3)/2 and 1/2, multiplied by 2/(2 + 2/sqrt(2) + sqrt(3)) for normalization. The coefficients are in the same order as in Section 5.1.1.2 and the matrices above.

精确的系数值为1、1/sqrt(2)、sqrt(3)/2和1/2,乘以2/(2+2/sqrt(2)+sqrt(3))进行归一化。系数的顺序与第5.1.1.2节和上述矩阵中的顺序相同。

Figure 9: Stereo Downmix Matrix for the 7.1 Surround Mapping

图9:7.1环绕贴图的立体声下混音矩阵

5.2. Comment Header
5.2. 注释标题
      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |      'O'      |      'p'      |      'u'      |      's'      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |      'T'      |      'a'      |      'g'      |      's'      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                     Vendor String Length                      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     :                        Vendor String...                       :
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                   User Comment List Length                    |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                 User Comment #0 String Length                 |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     :                   User Comment #0 String...                   :
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                 User Comment #1 String Length                 |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     :                                                               :
        
      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |      'O'      |      'p'      |      'u'      |      's'      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |      'T'      |      'a'      |      'g'      |      's'      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                     Vendor String Length                      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     :                        Vendor String...                       :
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                   User Comment List Length                    |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                 User Comment #0 String Length                 |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     :                   User Comment #0 String...                   :
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                 User Comment #1 String Length                 |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     :                                                               :
        

Figure 10: Comment Header Packet

图10:注释头数据包

The comment header consists of a 64-bit 'magic signature' field, followed by data in the same format as the [VORBIS-COMMENT] header used in Ogg Vorbis, except (like Ogg Theora and Speex) the final 'framing bit' specified in the Vorbis specification is not present.

注释头由64位的“魔术签名”字段组成,后跟与Ogg VORBIS中使用的[VORBIS-comment]头格式相同的数据,但VORBIS规范中指定的最终“帧位”不存在(如Ogg Theora和Speex)。

1. Magic Signature:

1. 魔术签名:

This is an 8-octet (64-bit) field that allows codec identification and is human readable. It contains, in order, the magic numbers:

这是一个8八位字节(64位)字段,允许编解码器识别,并且是人类可读的。它按顺序包含魔术数字:

0x4F 'O'

0x4F'O'

0x70 'p'

0x70“p”

0x75 'u'

0x75“u”

0x73 's'

0x73's'

0x54 'T'

0x54'T'

0x61 'a'

0x61‘a’

0x67 'g'

0x67‘g’

0x73 's'

0x73's'

Starting with "Op" helps distinguish it from audio data packets, as this is an invalid TOC sequence.

以“Op”开头有助于将其与音频数据包区分开来,因为这是一个无效的TOC序列。

2. Vendor String Length (32 bits, unsigned, little endian):

2. 供应商字符串长度(32位,无符号,小端):

This field gives the length of the following vendor string, in octets. It MUST NOT indicate that the vendor string is longer than the rest of the packet.

此字段给出以下供应商字符串的长度(以八位字节为单位)。它不能指示供应商字符串比数据包的其余部分长。

3. Vendor String (variable length, UTF-8 vector):

3. 供应商字符串(可变长度,UTF-8向量):

This is a simple human-readable tag for vendor information, encoded as a UTF-8 string [RFC3629]. No terminating null octet is necessary.

这是一个用于供应商信息的简单易读标记,编码为UTF-8字符串[RFC3629]。不需要终止空八位字节。

This tag is intended to identify the codec encoder and encapsulation implementations, for tracing differences in technical behavior. User-facing applications can use the 'ENCODER' user comment tag to identify themselves.

此标记用于标识编解码器编码器和封装实现,以跟踪技术行为的差异。面向用户的应用程序可以使用“编码器”用户注释标记来标识自己。

4. User Comment List Length (32 bits, unsigned, little endian):

4. 用户注释列表长度(32位,无符号,小尾端):

This field indicates the number of user-supplied comments. It MAY indicate there are zero user-supplied comments, in which case there are no additional fields in the packet. It MUST NOT indicate that there are so many comments that the comment string lengths would require more data than is available in the rest of the packet.

此字段表示用户提供的注释数。它可能表示没有用户提供的注释,在这种情况下,数据包中没有其他字段。它不能表示注释太多,以至于注释字符串长度需要比数据包其余部分更多的数据。

5. User Comment #i String Length (32 bits, unsigned, little endian):

5. 用户注释#i字符串长度(32位,无符号,小尾端):

This field gives the length of the following user comment string, in octets. There is one for each user comment indicated by the 'user comment list length' field. It MUST NOT indicate that the string is longer than the rest of the packet.

此字段给出以下用户注释字符串的长度(以八位字节为单位)。“用户注释列表长度”字段指示的每个用户注释都有一个。它不能指示字符串比数据包的其余部分长。

6. User Comment #i String (variable length, UTF-8 vector):

6. 用户注释#i字符串(可变长度,UTF-8向量):

This field contains a single user comment encoded as a UTF-8 string [RFC3629]. There is one for each user comment indicated by the 'user comment list length' field.

此字段包含编码为UTF-8字符串[RFC3629]的单个用户注释。“用户注释列表长度”字段指示的每个用户注释都有一个。

The 'vendor string length' and 'user comment list length' fields are REQUIRED, and implementations SHOULD treat a stream as invalid if it contains a comment header that does not have enough data for these fields, or that does not contain enough data for the corresponding vendor string or user comments they describe. Making this check before allocating the associated memory to contain the data helps prevent a possible Denial-of-Service (DoS) attack from small comment headers that claim to contain strings longer than the entire packet or more user comments than could possibly fit in the packet.

“供应商字符串长度”和“用户注释列表长度”字段是必需的,如果流包含的注释头没有足够的数据用于这些字段,或者没有足够的数据用于它们描述的相应供应商字符串或用户注释,则实现应将其视为无效。在分配相关内存以包含数据之前进行此检查有助于防止来自小注释头的可能拒绝服务(DoS)攻击,这些小注释头声称包含比整个数据包长的字符串或比数据包中可能容纳的用户注释更多的字符串。

Immediately following the user comment list, the comment header MAY contain zero-padding or other binary data that is not specified here. If the least-significant bit of the first byte of this data is 1, then editors SHOULD preserve the contents of this data when updating the tags, but if this bit is 0, all such data MAY be treated as padding, and truncated or discarded as desired. This allows informal experimentation with the format of this binary data until it can be specified later.

紧接着用户注释列表,注释标题可能包含零填充或此处未指定的其他二进制数据。如果此数据的第一个字节的最低有效位为1,则编辑器在更新标记时应保留此数据的内容,但如果此位为0,则所有此类数据都可以视为填充,并根据需要截断或丢弃。这允许非正式地试验这种二进制数据的格式,直到以后可以指定它为止。

The comment header can be arbitrarily large and might be spread over a large number of Ogg pages. Implementations MUST avoid attempting to allocate excessive amounts of memory when presented with a very large comment header. To accomplish this, implementations MAY treat a stream as invalid if it has a comment header larger than

注释标题可以任意大,并且可能分布在大量Ogg页面上。当呈现非常大的注释头时,实现必须避免尝试分配过多的内存。为了实现这一点,如果流的注释头大于,则实现可能会将其视为无效流

125,829,120 octets (120 MB), and MAY ignore individual comments that are not fully contained within the first 61,440 octets of the comment header.

125829120个八位字节(120 MB),可以忽略注释头的前61440个八位字节中未完全包含的单个注释。

5.2.1. Tag Definitions
5.2.1. 标记定义

The user comment strings follow the NAME=value format described by [VORBIS-COMMENT] with the same recommended tag names: ARTIST, TITLE, DATE, ALBUM, and so on.

用户注释字符串遵循[VORBIS-comment]描述的NAME=value格式,具有相同的推荐标记名:艺术家、标题、日期、相册等。

Two new comment tags are introduced here:

此处引入了两个新的注释标记:

First, an optional gain for track normalization:

首先,轨迹标准化的可选增益:

R128_TRACK_GAIN=-573

R128_轨道_增益=-573

representing the volume shift needed to normalize the track's volume during isolated playback, in random shuffle, and so on. The gain is a Q7.8 fixed-point number in dB, as in the ID header's 'output gain' field. This tag is similar to the REPLAYGAIN_TRACK_GAIN tag in Vorbis [REPLAY-GAIN], except that the normal volume reference is the [EBU-R128] standard.

表示在独立播放、随机洗牌等过程中规范化曲目音量所需的音量偏移。增益是Q7.8定点数字,单位为dB,如ID标头的“输出增益”字段中所示。该标签类似于Vorbis[REPLAY-GAIN]中的REPLAYGAIN_TRACK_GAIN标签,只是正常音量参考是[EBU-R128]标准。

Second, an optional gain for album normalization:

第二,相册标准化的可选增益:

R128_ALBUM_GAIN=111

R128_相册_增益=111

representing the volume shift needed to normalize the overall volume when played as part of a particular collection of tracks. The gain is also a Q7.8 fixed-point number in dB, as in the ID header's 'output gain' field. The values '-573' and '111' given here are just examples.

表示当作为特定曲目集合的一部分播放时,规范化总音量所需的音量偏移。增益也是一个Q7.8定点数字,单位为dB,如ID标头的“输出增益”字段中所示。此处给出的值'-573'和'111'仅为示例。

An Ogg Opus stream MUST NOT have more than one of each of these tags, and, if present, their values MUST be an integer from -32768 to 32767, inclusive, represented in ASCII as a base 10 number with no whitespace. A leading '+' or '-' character is valid. Leading zeros are also permitted, but the value MUST be represented by no more than 6 characters. Other non-digit characters MUST NOT be present.

Ogg Opus流中的每个标记不能超过一个,如果存在,它们的值必须是-32768到32767(含)之间的整数,用ASCII表示为不带空格的10进制数字。前导“+”或“-”字符有效。也允许使用前导零,但值必须由不超过6个字符表示。其他非数字字符不得出现。

If present, R128_TRACK_GAIN and R128_ALBUM_GAIN MUST correctly represent the R128 normalization gain relative to the 'output gain' field specified in the ID header. If a player chooses to make use of the R128_TRACK_GAIN tag or the R128_ALBUM_GAIN tag, it MUST apply those gains _in addition_ to the 'output gain' value. If a tool modifies the ID header's 'output gain' field, it MUST also update or

如果存在,R128_曲目_增益和R128_专辑_增益必须正确表示R128标准化增益相对于ID标头中指定的“输出增益”字段。如果玩家选择使用R128曲目增益标签或R128专辑增益标签,则必须将这些增益应用于“输出增益”值之外。如果工具修改ID标题的“输出增益”字段,它还必须更新或删除

remove the R128_TRACK_GAIN and R128_ALBUM_GAIN comment tags if present. A muxer SHOULD place the gain it wants other tools to use by default into the 'output gain' field, and not the comment tag.

删除R128_曲目增益和R128_专辑增益注释标签(如果存在)。muxer应该将它希望其他工具默认使用的增益放入“输出增益”字段,而不是注释标记。

To avoid confusion with multiple normalization schemes, an Opus comment header SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK, REPLAYGAIN_ALBUM_GAIN, or REPLAYGAIN_ALBUM_PEAK tags, unless they are only to be used in some context where there is guaranteed to be no such confusion. [EBU-R128] normalization is preferred to the earlier REPLAYGAIN schemes because of its clear definition and adoption by industry. Peak normalizations are difficult to calculate reliably for lossy codecs because of variation in excursion heights due to decoder differences. In the authors' investigations, they were not applied consistently or broadly enough to merit inclusion here.

为避免与多个规范化方案混淆,Opus注释标题不应包含任何REPLAYGAIN_TRACK_GAIN、REPLAYGAIN_TRACK_PEAK、REPLAYGAIN_ALBUM_GAIN或REPLAYGAIN_ALBUM_PEAK标记,除非它们仅在保证不存在此类混淆的某些上下文中使用。[EBU-R128]标准化优先于早期的REPLAYGAIN方案,因为其定义明确且被业界采用。对于有损编解码器,由于解码器差异导致偏移高度的变化,因此很难可靠地计算峰值规格化。在提交人的调查中,他们没有得到一致或广泛的应用,因此不值得在此纳入。

6. Packet Size Limits
6. 数据包大小限制

Technically, valid Opus packets can be arbitrarily large due to the padding format, although the amount of non-padding data they can contain is bounded. These packets might be spread over a similarly enormous number of Ogg pages. When encoding, implementations SHOULD limit the use of padding in audio data packets to no more than is necessary to make a VBR stream CBR, unless they have no reasonable way to determine what is necessary. Demuxers SHOULD treat audio data packets as invalid (treat them as if they were malformed Opus packets with an invalid TOC sequence) if they are larger than 61,440 octets per Opus stream, unless they have a specific reason for allowing extra padding. Such packets necessarily contain more padding than needed to make a stream CBR. Demuxers MUST avoid attempting to allocate excessive amounts of memory when presented with a very large packet. Demuxers MAY treat audio data packets as invalid or partially process them if they are larger than 61,440 octets in an Ogg Opus stream with channel mapping families 0 or 1. Demuxers MAY treat audio data packets as invalid or partially process them in any Ogg Opus stream if the packet is larger than 61,440 octets and also larger than 7,680 octets per Opus stream. The presence of an extremely large packet in the stream could indicate a memory exhaustion attack or stream corruption.

从技术上讲,由于填充格式,有效的Opus数据包可以任意大,尽管它们可以包含的非填充数据量是有限的。这些数据包可能分布在同样数量庞大的Ogg页面上。编码时,实现应将音频数据包中填充的使用限制在不超过生成VBR流CBR所需的范围内,除非它们没有合理的方法来确定所需内容。如果音频数据包大于每个Opus流61440个八位字节,则解复用器应将音频数据包视为无效(将其视为格式错误的Opus数据包,且TOC序列无效),除非它们有允许额外填充的特定原因。这样的数据包必然包含比生成流CBR所需的更多的填充。当呈现非常大的数据包时,解复用器必须避免尝试分配过多的内存。如果音频数据包在信道映射族为0或1的Ogg Opus流中大于61440个八位字节,则解复用器可将音频数据包视为无效或部分处理。如果音频数据包大于61440个八位字节,并且每个Opus流也大于7680个八位字节,则解复用器可以将音频数据包视为无效或在任何Ogg Opus流中部分处理它们。流中存在超大数据包可能表示内存耗尽攻击或流损坏。

In an Ogg Opus stream, the largest possible valid packet that does not use padding has a size of (61,298*N - 2) octets. With 255 streams, this is 15,630,988 octets and can span up to 61,298 Ogg pages, all but one of which will have a granule position of -1. This is, of course, a very extreme packet, consisting of 255 streams, each containing 120 ms of audio encoded as 2.5 ms frames, each frame using the maximum possible number of octets (1275) and stored in the least

在Ogg Opus流中,不使用填充的最大可能有效数据包的大小为(61298*N-2)个八位字节。对于255个流,这是15630988个八位字节,可以跨越多达61298个Ogg页面,除了一个之外,所有这些页面的颗粒位置都是-1。当然,这是一个非常极端的数据包,由255个流组成,每个流包含120毫秒的音频,编码为2.5毫秒帧,每个帧使用尽可能多的八位字节(1275)并存储在最少的存储空间中

efficient manner allowed (a VBR code 3 Opus packet). Even in such a packet, most of the data will be zeros as 2.5 ms frames cannot actually use all 1275 octets.

允许的有效方式(VBR代码3 Opus数据包)。即使在这样的数据包中,大多数数据都是零,因为2.5毫秒的帧实际上不能使用所有1275个八位字节。

The largest packet consisting of entirely useful data is (15,326*N - 2) octets. This corresponds to 120 ms of audio encoded as 10 ms frames in either SILK or Hybrid mode, but at a data rate of over 1 Mbps, which makes little sense for the quality achieved.

由完全有用的数据组成的最大数据包是(15326*N-2)个八位字节。这对应于120毫秒的音频,在丝绸或混合模式下编码为10毫秒帧,但数据速率超过1 Mbps,这对实现的质量没有什么意义。

A more reasonable limit is (7,664*N - 2) octets. This corresponds to 120 ms of audio encoded as 20 ms stereo CELT mode frames, with a total bitrate just under 511 kbps (not counting the Ogg encapsulation overhead). For channel mapping family 1, N = 8 provides a reasonable upper bound, as it allows for each of the 8 possible output channels to be decoded from a separate stereo Opus stream. This gives a size of 61,310 octets, which is rounded up to a multiple of 1,024 octets to yield the audio data packet size of 61,440 octets that any implementation is expected to be able to process successfully.

更合理的限制是(7664*N-2)个八位字节。这对应于编码为20ms立体声CELT模式帧的120ms音频,总比特率略低于511kbps(不包括Ogg封装开销)。对于通道映射族1,N=8提供了合理的上限,因为它允许从单独的立体声Opus流解码8个可能的输出通道中的每一个。这给出了61310个八位字节的大小,四舍五入为1024个八位字节的倍数,以产生任何实现都有望成功处理的61440个八位字节的音频数据包大小。

7. Encoder Guidelines
7. 编码器指南

When encoding Opus streams, Ogg muxers SHOULD take into account the algorithmic delay of the Opus encoder.

当编码Opus流时,Ogg复用器应考虑Opus编码器的算法延迟。

In encoders derived from the reference implementation [RFC6716], the number of samples can be queried with

在源于参考实现[RFC6716]的编码器中,可以使用

    opus_encoder_ctl(encoder_state, OPUS_GET_LOOKAHEAD(&delay_samples));
        
    opus_encoder_ctl(encoder_state, OPUS_GET_LOOKAHEAD(&delay_samples));
        

To achieve good quality in the very first samples of a stream, implementations MAY use linear predictive coding (LPC) extrapolation to generate at least 120 extra samples at the beginning to avoid the Opus encoder having to encode a discontinuous signal. For more information on linear prediction, see [LINEAR-PREDICTION]. For an input file containing 'length' samples, the implementation SHOULD set the 'pre-skip' header value to (delay_samples + extra_samples), encode at least (length + delay_samples + extra_samples) samples, and set the granule position of the last page to (length + delay_samples + extra_samples). This ensures that the encoded file has the same duration as the original, with no time offset. The best way to pad the end of the stream is to also use LPC extrapolation, but zero-padding is also acceptable.

为了在流的第一个样本中实现良好质量,实现可以使用线性预测编码(LPC)外推来在开始时生成至少120个额外样本,以避免Opus编码器必须编码不连续信号。有关线性预测的更多信息,请参阅[线性预测]。对于包含“length”样本的输入文件,实现应将“pre skip”头值设置为(delay_samples+extra_samples),至少编码(length+delay_samples+extra_samples)样本,并将最后一页的颗粒位置设置为(length+delay_samples+extra_samples)。这确保编码文件的持续时间与原始文件相同,没有时间偏移。填充流结尾的最佳方法是也使用LPC外推,但也可以接受零填充。

7.1. LPC Extrapolation
7.1. LPC外推

The first step in LPC extrapolation is to compute linear prediction coefficients [LPC-SAMPLE]. When extending the end of the signal, order-N (typically with N ranging from 8 to 40) LPC analysis is performed on a window near the end of the signal. The last N samples are used as memory to an infinite impulse response (IIR) filter.

LPC外推的第一步是计算线性预测系数[LPC-SAMPLE]。当扩展信号末端时,在靠近信号末端的窗口上执行order-N(通常N范围为8到40)LPC分析。最后N个样本用作无限脉冲响应(IIR)滤波器的存储器。

The filter is then applied on a zero input to extrapolate the end of the signal. Let 'a(k)' be the kth LPC coefficient and 'x(n)' be the nth sample of the signal. Each new sample past the end of the signal is computed as

然后将滤波器应用于零输入,以外推信号的结束。设“a(k)”为第k个LPC系数,“x(n)”为信号的第n个样本。经过信号末尾的每个新样本计算为

                                 N
                                ---
                         x(n) = \   a(k)*x(n - k)
                                /
                                ---
                               k = 1
        
                                 N
                                ---
                         x(n) = \   a(k)*x(n - k)
                                /
                                ---
                               k = 1
        

The process is repeated independently for each channel. It is possible to extend the beginning of the signal by applying the same process backward in time. When extending the beginning of the signal, it is best to apply a "fade in" to the extrapolated signal, e.g., by multiplying it by a half-Hanning window [HANNING].

该过程对每个通道独立重复。通过在时间上向后应用相同的过程,可以延长信号的开始。当扩展信号的开始时,最好对外推信号应用“淡入”,例如,将其乘以半汉宁窗[Hanning]。

7.2. Continuous Chaining
7.2. 连续链接

In some applications, such as Internet radio, it is desirable to cut a long stream into smaller chains, e.g., so the comment header can be updated. This can be done simply by separating the input streams into segments and encoding each segment independently. The drawback of this approach is that it creates a small discontinuity at the boundary due to the lossy nature of Opus. A muxer MAY avoid this discontinuity by using the following procedure:

在一些应用中,例如互联网广播,希望将长流切割成更小的链,例如,以便可以更新注释头。这可以简单地通过将输入流分离为段并独立地编码每个段来实现。这种方法的缺点是,由于Opus的损耗特性,它在边界处产生了一个小的不连续性。多路复用器可通过使用以下程序避免这种不连续性:

1. Encode the last frame of the first segment as an independent frame by turning off all forms of inter-frame prediction. De-emphasis is allowed.

1. 通过关闭所有形式的帧间预测,将第一段的最后一帧编码为独立帧。可以不强调重点。

2. Set the granule position of the last page to a point near the end of the last frame.

2. 将最后一页的颗粒位置设置为靠近最后一帧末尾的点。

3. Begin the second segment with a copy of the last frame of the first segment.

3. 以第一段最后一帧的副本开始第二段。

4. Set the 'pre-skip' value of the second stream in such a way as to properly join the two streams.

4. 设置第二个流的“预跳过”值,以便正确连接两个流。

5. Continue the encoding process normally from there, without any reset to the encoder.

5. 从此处继续正常编码过程,无需对编码器进行任何重置。

In encoders derived from the reference implementation, inter-frame prediction can be turned off by calling

在从参考实现派生的编码器中,可以通过调用

     opus_encoder_ctl(encoder_state, OPUS_SET_PREDICTION_DISABLED(1));
        
     opus_encoder_ctl(encoder_state, OPUS_SET_PREDICTION_DISABLED(1));
        

For best results, this implementation requires that prediction be explicitly enabled again before resuming normal encoding, even after a reset.

为了获得最佳结果,此实现要求在恢复正常编码之前再次显式启用预测,即使在重置之后也是如此。

8. Security Considerations
8. 安全考虑

Implementations of the Opus codec need to take appropriate security considerations into account, as outlined in [RFC4732]. This is just as much a problem for the container as it is for the codec itself. Malicious payloads and/or input streams can be used to attack codec implementations. Implementations MUST NOT overrun their allocated memory nor consume excessive resources when decoding payloads or processing input streams. Although problems in encoding applications are typically rarer, this still applies to a muxer, as vulnerabilities would allow an attacker to attack transcoding gateways.

Opus编解码器的实现需要考虑适当的安全因素,如[RFC4732]所述。这对于容器和编解码器本身都是一个问题。恶意有效负载和/或输入流可用于攻击编解码器实现。在解码有效负载或处理输入流时,实现不能超出其分配的内存,也不能消耗过多的资源。虽然编码应用程序中的问题通常比较少见,但这仍然适用于muxer,因为漏洞会允许攻击者攻击转码网关。

Header parsing code contains the most likely area for potential overruns. It is important for implementations to ensure their buffers contain enough data for all of the required fields before attempting to read it (for example, for all of the channel map data in the ID header). Implementations would do well to validate the indices of the channel map, also, to ensure they meet all of the restrictions outlined in Section 5.1.1, in order to avoid attempting to read data from channels that do not exist.

标头解析代码包含最有可能发生溢出的区域。对于实现而言,在尝试读取之前,确保其缓冲区包含足够的数据以满足所有必需字段的需要(例如,对于ID标头中的所有通道映射数据),这一点很重要。为了避免试图从不存在的通道读取数据,实现时最好验证通道图的索引,以确保它们满足第5.1.1节中概述的所有限制。

To avoid excessive resource usage, we advise implementations to be especially wary of streams that might cause them to process far more data than was actually transmitted. For example, a relatively small comment header may contain values for the string lengths or user comment list length that imply that it is many gigabytes in size. Even computing the size of the required buffer could overflow a 32-bit integer, and actually attempting to allocate such a buffer before verifying it would be a reasonable size is a bad idea. After reading the user comment list length, implementations might wish to verify that the header contains at least the minimum amount of data for that many comments (4 additional octets per comment, to indicate each has a length of zero) before proceeding any further, again taking care to avoid overflow in these calculations. If allocating

为了避免过度使用资源,我们建议实现特别小心可能导致它们处理的数据量远远超过实际传输的数据量的流。例如,相对较小的注释头可能包含字符串长度或用户注释列表长度的值,这意味着它的大小为千兆字节。即使计算所需缓冲区的大小也可能使32位整数溢出,而实际上在验证缓冲区大小是否合理之前尝试分配缓冲区是一个坏主意。在读取用户注释列表长度之后,实现可能希望在继续任何操作之前,验证报头至少包含该多个注释的最小数据量(每个注释额外4个八位字节,以指示每个注释的长度为零),再次注意避免这些计算中的溢出。如果分配

an array of pointers to point at these strings, the size of the pointers may be larger than 4 octets, potentially requiring a separate overflow check.

指向这些字符串的指针数组,指针的大小可能大于4个八位字节,可能需要单独的溢出检查。

Another bug in this class we have observed more than once involves the handling of invalid data at the end of a stream. Often, implementations will seek to the end of a stream to locate the last timestamp in order to compute its total duration. If they do not find a valid capture pattern and Ogg page from the desired logical stream, they will back up and try again. If care is not taken to avoid re-scanning data that was already scanned, this search can quickly devolve into something with a complexity that is quadratic in the amount of invalid data.

我们不止一次观察到的此类中的另一个bug涉及在流末尾处理无效数据。通常,实现会在流的末尾查找最后一个时间戳,以计算其总持续时间。如果他们没有从所需的逻辑流中找到有效的捕获模式和Ogg页面,他们将备份并重试。如果不小心避免重新扫描已经扫描过的数据,此搜索可能会迅速演变为复杂程度为无效数据量的二次方的搜索。

In general, when seeking, implementations will wish to be cautious about the effects of invalid granule position values and ensure all algorithms will continue to make progress and eventually terminate, even if these are missing or out of order.

一般来说,在寻找时,实现将希望对无效颗粒位置值的影响保持谨慎,并确保所有算法将继续取得进展并最终终止,即使这些算法丢失或出现故障。

Like most other container formats, Ogg Opus streams SHOULD NOT be used with insecure ciphers or cipher modes that are vulnerable to known-plaintext attacks. Elements such as the Ogg page capture pattern and the 'magic signature' fields in the ID header and the comment header all have easily predictable values, in addition to various elements of the codec data itself.

与大多数其他容器格式一样,Ogg Opus流不应与易受已知明文攻击的不安全密码或密码模式一起使用。除了编解码器数据本身的各种元素外,诸如Ogg页面捕获模式、ID头和注释头中的“魔术签名”字段等元素都具有易于预测的值。

9. Content Type
9. 内容类型

An "Ogg Opus file" consists of one or more sequentially multiplexed segments, each containing exactly one Ogg Opus stream. The RECOMMENDED mime-type for Ogg Opus files is "audio/ogg".

“Ogg Opus文件”由一个或多个顺序复用段组成,每个段恰好包含一个Ogg Opus流。Ogg Opus文件的建议mime类型为“音频/Ogg”。

If more specificity is desired, one MAY indicate the presence of Opus streams using the codecs parameter defined in [RFC6381] and [RFC5334], e.g.,

如果需要更多的特异性,可以使用[RFC6381]和[RFC5334]中定义的编解码器参数指示Opus流的存在,例如。,

                            audio/ogg; codecs=opus
        
                            audio/ogg; codecs=opus
        

for an Ogg Opus file.

对于Ogg Opus文件。

The RECOMMENDED filename extension for Ogg Opus files is '.opus'.

Ogg Opus文件的建议文件扩展名为“.Opus”。

When Opus is concurrently multiplexed with other streams in an Ogg container, one SHOULD use one of the "audio/ogg", "video/ogg", or "application/ogg" mime-types, as defined in [RFC5334]. Such streams are not strictly "Ogg Opus files" as described above, since they

当Opus与Ogg容器中的其他流同时多路复用时,应使用[RFC5334]中定义的“音频/Ogg”、“视频/Ogg”或“应用程序/Ogg”mime类型之一。这样的流不是严格意义上的“Ogg Opus文件”,因为它们是

contain more than a single Opus stream per sequentially multiplexed segment, e.g., video or multiple audio tracks. In such cases, the '.opus' filename extension is NOT RECOMMENDED.

每个顺序多路复用段包含多个Opus流,例如视频或多个音频曲目。在这种情况下,不建议使用“.opus”文件扩展名。

In either case, this document updates [RFC5334] to add "opus" as a codecs parameter value with char[8]: 'OpusHead' as Codec Identifier.

在任何一种情况下,本文档都会更新[RFC5334],将“opus”添加为编解码器参数值,并将char[8]:“OpusHead”作为编解码器标识符。

10. IANA Considerations
10. IANA考虑

Per this document, IANA has updated the "Media Types" registry by adding .opus as a file extension for "audio/ogg" and adding itself as a reference alongside [RFC5334] for "audio/ogg", "video/ogg", and "application/ogg" Media Types.

根据本文件,IANA更新了“媒体类型”注册表,添加了.opus作为“音频/ogg”的文件扩展名,并在[RFC5334]旁边添加了自身作为“音频/ogg”、“视频/ogg”和“应用程序/ogg”媒体类型的参考。

This document defines a new registry "Opus Channel Mapping Families" to indicate how the semantic meanings of the channels in a multi-channel Opus stream are described. IANA has created a new namespace of "Opus Channel Mapping Families". This registry is listed on the IANA Matrix. Modifications to this registry follow the "Specification Required" registration policy as defined in [RFC5226]. Each registry entry consists of a Channel Mapping Family Number, which is specified in decimal in the range 0 to 255, inclusive, and a Reference (or list of references). Each Reference must point to sufficient documentation to describe what information is coded in the Opus identification header for this channel mapping family, how a demuxer determines the stream count ('N') and coupled stream count ('M') from this information, and how it determines the proper interpretation of each of the decoded channels.

本文档定义了一个新的注册表“Opus Channel Mapping Families”,以说明如何描述多通道Opus流中通道的语义。IANA创建了一个新的名称空间“Opus Channel Mapping Families”。此注册表列在IANA矩阵上。对该注册表的修改遵循[RFC5226]中定义的“所需规范”注册策略。每个注册表项由一个通道映射族编号(以十进制形式在0到255(含0到255)范围内指定)和一个引用(或引用列表)组成。每个参考必须指向足够的文档,以描述Opus标识头中为此信道映射系列编码的信息,解复用器如何根据该信息确定流计数(“N”)和耦合流计数(“M”),以及它如何确定每个解码信道的正确解释。

This document defines three initial assignments for this registry.

本文档定义了此注册表的三个初始分配。

                   +-------+---------------------------+
                   | Value | Reference                 |
                   +-------+---------------------------+
                   | 0     | RFC 7845, Section 5.1.1.1 |
                   |       |                           |
                   | 1     | RFC 7845, Section 5.1.1.2 |
                   |       |                           |
                   | 255   | RFC 7845, Section 5.1.1.3 |
                   +-------+---------------------------+
        
                   +-------+---------------------------+
                   | Value | Reference                 |
                   +-------+---------------------------+
                   | 0     | RFC 7845, Section 5.1.1.1 |
                   |       |                           |
                   | 1     | RFC 7845, Section 5.1.1.2 |
                   |       |                           |
                   | 255   | RFC 7845, Section 5.1.1.3 |
                   +-------+---------------------------+
        

The designated expert will determine if the Reference points to a specification that meets the requirements for permanence and ready availability laid out in [RFC5226] and whether it specifies the information described above with sufficient clarity to allow interoperable implementations.

指定专家将确定引用的规范是否满足[RFC5226]中规定的永久性和就绪可用性要求,以及是否明确规定了上述信息,以允许互操作实施。

11. References
11. 工具书类
11.1. Normative References
11.1. 规范性引用文件

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>.

[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,DOI 10.17487/RFC2119,1997年3月<https://www.rfc-editor.org/info/rfc2119>.

[RFC3533] Pfeiffer, S., "The Ogg Encapsulation Format Version 0", RFC 3533, DOI 10.17487/RFC3533, May 2003, <https://www.rfc-editor.org/info/rfc3533>.

[RFC3533]Pfeiffer,S.,“Ogg封装格式版本0”,RFC 3533,DOI 10.17487/RFC3533,2003年5月<https://www.rfc-editor.org/info/rfc3533>.

[RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 2003, <https://www.rfc-editor.org/info/rfc3629>.

[RFC3629]Yergeau,F.,“UTF-8,ISO 10646的转换格式”,STD 63,RFC 3629,DOI 10.17487/RFC3629,2003年11月<https://www.rfc-editor.org/info/rfc3629>.

[RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 5226, DOI 10.17487/RFC5226, May 2008, <https://www.rfc-editor.org/info/rfc5226>.

[RFC5226]Narten,T.和H.Alvestrand,“在RFCs中编写IANA注意事项部分的指南”,BCP 26,RFC 5226,DOI 10.17487/RFC5226,2008年5月<https://www.rfc-editor.org/info/rfc5226>.

[RFC5334] Goncalves, I., Pfeiffer, S., and C. Montgomery, "Ogg Media Types", RFC 5334, DOI 10.17487/RFC5334, September 2008, <https://www.rfc-editor.org/info/rfc5334>.

[RFC5334]冈卡尔维斯,I.,普菲弗,S.和C.蒙哥马利,“Ogg媒体类型”,RFC 5334,DOI 10.17487/RFC5334,2008年9月<https://www.rfc-editor.org/info/rfc5334>.

[RFC6381] Gellens, R., Singer, D., and P. Frojdh, "The 'Codecs' and 'Profiles' Parameters for "Bucket" Media Types", RFC 6381, DOI 10.17487/RFC6381, August 2011, <https://www.rfc-editor.org/info/rfc6381>.

[RFC6381]Gellens,R.,Singer,D.,和P.Frojdh,“桶”媒体类型的“编解码器”和“配置文件”参数”,RFC 6381,DOI 10.17487/RFC6381,2011年8月<https://www.rfc-editor.org/info/rfc6381>.

[RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, September 2012, <https://www.rfc-editor.org/info/rfc6716>.

[RFC6716]Valin,JM.,Vos,K.,和T.Terriberry,“作品音频编解码器的定义”,RFC 6716,DOI 10.17487/RFC6716,2012年9月<https://www.rfc-editor.org/info/rfc6716>.

[EBU-R128] EBU Technical Committee, "Loudness Recommendation EBU R128", August 2011, <https://tech.ebu.ch/loudness>.

[EBU-R128]EBU技术委员会,“响度建议EBU R128”,2011年8月<https://tech.ebu.ch/loudness>.

[VORBIS-COMMENT] Montgomery, C., "Ogg Vorbis I Format Specification: Comment Field and Header Specification", July 2002, <https://www.xiph.org/vorbis/doc/v-comment.html>.

[VORBIS-COMMENT]Montgomery,C.,“Ogg VORBIS I格式规范:注释字段和标题规范”,2002年7月<https://www.xiph.org/vorbis/doc/v-comment.html>.

11.2. Informative References
11.2. 资料性引用

[RFC4732] Handley, M., Ed., Rescorla, E., Ed., and IAB, "Internet Denial-of-Service Considerations", RFC 4732, DOI 10.17487/RFC4732, December 2006, <https://www.rfc-editor.org/info/rfc4732>.

[RFC4732]Handley,M.,Ed.,Rescorla,E.,Ed.,和IAB,“互联网拒绝服务注意事项”,RFC 4732,DOI 10.17487/RFC4732,2006年12月<https://www.rfc-editor.org/info/rfc4732>.

[RFC7587] Spittka, J., Vos, K., and JM. Valin, "RTP Payload Format for the Opus Speech and Audio Codec", RFC 7587, DOI 10.17487/RFC7587, June 2015, <https://www.rfc-editor.org/info/rfc7587>.

[RFC7587]Spittka,J.,Vos,K.,和JM。Valin,“Opus语音和音频编解码器的RTP有效载荷格式”,RFC 7587,DOI 10.17487/RFC7587,2015年6月<https://www.rfc-editor.org/info/rfc7587>.

[FLAC] Coalson, J., "FLAC - Free Lossless Audio Codec Format Description", January 2008, <https://xiph.org/flac/format.html>.

[FLAC]Coalson,J.,“FLAC-免费无损音频编解码器格式描述”,2008年1月<https://xiph.org/flac/format.html>.

[HANNING] Wikipedia, "Hann window", February 2016, <https://en.wikipedia.org/w/index.php?title=Window_functio n&oldid=703074467#Hann_.28Hanning.29_window>.

[HANNING]维基百科,“Hann window”,2016年2月<https://en.wikipedia.org/w/index.php?title=Window_functio n&oldid=703074467 35; Hann#u0.28Hanning.29_window>。

[LINEAR-PREDICTION] Wikipedia, "Linear Predictive Coding", October 2015, <https://en.wikipedia.org/w/ index.php?title=Linear_predictive_coding&oldid=687498962>.

[线性预测]维基百科,“线性预测编码”,2015年10月<https://en.wikipedia.org/w/ index.php?title=Linear\u predictive\u coding&oldid=687498962>。

[LPC-SAMPLE] Degener, J. and C. Bormann, "Autocorrelation LPC coeff generation algorithm (Vorbis source code)", November 1994, <https://svn.xiph.org/trunk/vorbis/lib/lpc.c>.

[LPC-SAMPLE]Degener,J.和C.Bormann,“自相关LPC系数生成算法(Vorbis源代码)”,1994年11月<https://svn.xiph.org/trunk/vorbis/lib/lpc.c>.

[Q-NOTATION] Wikipedia, "Q (number format)", December 2015, <https://en.wikipedia.org/w/ index.php?title=Q_%28number_format%29&oldid=697252615>.

[Q-符号]维基百科,“Q(数字格式)”,2015年12月<https://en.wikipedia.org/w/ index.php?title=Q\u%28number\u格式%29&oldid=697252615>。

[REPLAY-GAIN] Parker, C. and M. Leese, "VorbisComment: Replay Gain", June 2009, <https://wiki.xiph.org/VorbisComment#Replay_Gain>.

[REPLAY-GAIN]Parker,C.和M.Leese,“VorbisComment:REPLAY GAIN”,2009年6月<https://wiki.xiph.org/VorbisComment#Replay_Gain>.

[SEEKING] Pfeiffer, S., Parker, C., and G. Maxwell, "Granulepos Encoding and How Seeking Really Works", May 2012, <https://wiki.xiph.org/Seeking>.

[Seking]Pfeiffer,S.,Parker,C.,和G.Maxwell,“Granulepos编码和寻找的真正工作原理”,2012年5月<https://wiki.xiph.org/Seeking>.

[VORBIS-MAPPING] Montgomery, C., "The Vorbis I Specification, Section 4.3.9 Output Channel Order", January 2010, <https://www.xiph.org/vorbis/doc/ Vorbis_I_spec.html#x1-810004.3.9>.

[VORBIS-MAPPING]蒙哥马利,C.,“VORBIS I规范,第4.3.9节输出通道订单”,2010年1月<https://www.xiph.org/vorbis/doc/ Vorbis_I_spec.html#x1-810004.3.9>。

[VORBIS-TRIM] Montgomery, C., "The Vorbis I Specification, Appendix A: Embedding Vorbis into an Ogg stream", November 2008, <https://xiph.org/vorbis/doc/ Vorbis_I_spec.html#x1-132000A.2>.

[VORBIS-TRIM]蒙哥马利,C.,“VORBIS I规范,附录A:将VORBIS嵌入Ogg流”,2008年11月<https://xiph.org/vorbis/doc/ Vorbis_I_spec.html#x1-132000A.2>。

[WAVE-MULTICHANNEL] Microsoft Corporation, "Multiple Channel Audio Data and WAVE Files", March 2007, <https://msdn.microsoft.com/en-us/windows/hardware/ gg463006.aspx>.

[WAVE-MULTICHANNEL]微软公司,“多通道音频数据和WAVE文件”,2007年3月<https://msdn.microsoft.com/en-us/windows/hardware/ gg463006.aspx>。

Acknowledgments

致谢

Thanks to Ben Campbell, Joel M. Halpern, Mark Harris, Greg Maxwell, Christopher "Monty" Montgomery, Jean-Marc Valin, Stephan Wenger, and Mo Zanaty for their valuable contributions to this document. Additional thanks to Andrew D'Addesio, Greg Maxwell, and Vincent Penquerc'h for their feedback based on early implementations.

感谢本·坎贝尔、乔尔·哈尔伯恩、马克·哈里斯、格雷格·马克斯韦尔、克里斯托弗·蒙蒂·蒙哥马利、让·马克·瓦林、斯蒂芬·温格和莫·扎纳蒂对本文件的宝贵贡献。另外还要感谢Andrew D'Addesio、Greg Maxwell和Vincent Penquerch对早期实现的反馈。

Authors' Addresses

作者地址

Timothy B. Terriberry Mozilla Corporation 331 E. Evelyn Ave. Mountain View, CA 94041 United States

Timothy B.Terriberry Mozilla Corporation美国加利福尼亚州山景城E.Evelyn大道331号,邮编94041

   Phone: +1 650 903-0800
   Email: tterribe@xiph.org
        
   Phone: +1 650 903-0800
   Email: tterribe@xiph.org
        

Ron Lee Voicetronix 246 Pulteney Street, Level 1 Adelaide, SA 5000 Australia

Ron Lee Voicetronix澳大利亚南部阿德莱德Pulteney街246号1层

   Phone: +61 8 8232 9112
   Email: ron@debian.org
        
   Phone: +61 8 8232 9112
   Email: ron@debian.org
        

Ralph Giles Mozilla Corporation 163 West Hastings Street Vancouver, BC V6B 1H5 Canada

Ralph Giles Mozilla Corporation 163加拿大不列颠哥伦比亚省温哥华西黑斯廷斯街V6B 1H5

   Phone: +1 778 785 1540
   Email: giles@xiph.org
        
   Phone: +1 778 785 1540
   Email: giles@xiph.org