Internet Engineering Task Force (IETF)                         JM. Valin
Request for Comments: 6716                           Mozilla Corporation
Category: Standards Track                                         K. Vos
ISSN: 2070-1721                                  Skype Technologies S.A.
                                                           T. Terriberry
                                                     Mozilla Corporation
                                                          September 2012
        
Internet Engineering Task Force (IETF)                         JM. Valin
Request for Comments: 6716                           Mozilla Corporation
Category: Standards Track                                         K. Vos
ISSN: 2070-1721                                  Skype Technologies S.A.
                                                           T. Terriberry
                                                     Mozilla Corporation
                                                          September 2012
        

Definition of the Opus Audio Codec

Opus音频编解码器的定义

Abstract

摘要

This document defines the Opus interactive speech and audio codec. Opus is designed to handle a wide range of interactive audio applications, including Voice over IP, videoconferencing, in-game chat, and even live, distributed music performances. It scales from low bitrate narrowband speech at 6 kbit/s to very high quality stereo music at 510 kbit/s. Opus uses both Linear Prediction (LP) and the Modified Discrete Cosine Transform (MDCT) to achieve good compression of both speech and music.

本文档定义了Opus交互式语音和音频编解码器。Opus设计用于处理广泛的交互式音频应用,包括IP语音、视频会议、游戏内聊天,甚至现场分布式音乐表演。它可以从6 kbit/s的低比特率窄带语音扩展到510 kbit/s的高质量立体声音乐。Opus使用线性预测(LP)和改进的离散余弦变换(MDCT)来实现语音和音乐的良好压缩。

Status of This Memo

关于下段备忘

This is an Internet Standards Track document.

这是一份互联网标准跟踪文件。

This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741.

本文件是互联网工程任务组(IETF)的产品。它代表了IETF社区的共识。它已经接受了公众审查,并已被互联网工程指导小组(IESG)批准出版。有关互联网标准的更多信息,请参见RFC 5741第2节。

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6716.

有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc6716.

Copyright Notice

版权公告

Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved.

版权所有(c)2012 IETF信托基金和确定为文件作者的人员。版权所有。

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。

The licenses granted by the IETF Trust to this RFC under Section 3.c of the Trust Legal Provisions shall also include the right to extract text from Sections 1 through 8 and Appendix A and Appendix B of this RFC and create derivative works from these extracts, and to copy, publish, display and distribute such derivative works in any medium and for any purpose, provided that no such derivative work shall be presented, displayed or published in a manner that states or implies that it is part of this RFC or any other IETF Document.

IETF信托根据信托法律条款第3.c节授予本RFC的许可证还应包括摘录本RFC第1节至第8节以及附录A和附录B中的文本,并根据这些摘录创作衍生作品,以及复制、出版、,以任何媒介和目的展示和分发此类衍生作品,前提是不得以声明或暗示其为本RFC或任何其他IETF文件一部分的方式展示、展示或发布此类衍生作品。

Table of Contents

目录

   1. Introduction ....................................................5
      1.1. Notation and Conventions ...................................6
   2. Opus Codec Overview .............................................8
      2.1. Control Parameters ........................................10
           2.1.1. Bitrate ............................................10
           2.1.2. Number of Channels (Mono/Stereo) ...................11
           2.1.3. Audio Bandwidth ....................................11
           2.1.4. Frame Duration .....................................11
           2.1.5. Complexity .........................................11
           2.1.6. Packet Loss Resilience .............................12
           2.1.7. Forward Error Correction (FEC) .....................12
           2.1.8. Constant/Variable Bitrate ..........................12
           2.1.9. Discontinuous Transmission (DTX) ...................13
   3. Internal Framing ...............................................13
      3.1. The TOC Byte ..............................................13
      3.2. Frame Packing .............................................16
           3.2.1. Frame Length Coding ................................16
           3.2.2. Code 0: One Frame in the Packet ....................16
           3.2.3. Code 1: Two Frames in the Packet, Each with
                  Equal Compressed Size ..............................17
           3.2.4. Code 2: Two Frames in the Packet, with
                  Different Compressed Sizes .........................17
        
   1. Introduction ....................................................5
      1.1. Notation and Conventions ...................................6
   2. Opus Codec Overview .............................................8
      2.1. Control Parameters ........................................10
           2.1.1. Bitrate ............................................10
           2.1.2. Number of Channels (Mono/Stereo) ...................11
           2.1.3. Audio Bandwidth ....................................11
           2.1.4. Frame Duration .....................................11
           2.1.5. Complexity .........................................11
           2.1.6. Packet Loss Resilience .............................12
           2.1.7. Forward Error Correction (FEC) .....................12
           2.1.8. Constant/Variable Bitrate ..........................12
           2.1.9. Discontinuous Transmission (DTX) ...................13
   3. Internal Framing ...............................................13
      3.1. The TOC Byte ..............................................13
      3.2. Frame Packing .............................................16
           3.2.1. Frame Length Coding ................................16
           3.2.2. Code 0: One Frame in the Packet ....................16
           3.2.3. Code 1: Two Frames in the Packet, Each with
                  Equal Compressed Size ..............................17
           3.2.4. Code 2: Two Frames in the Packet, with
                  Different Compressed Sizes .........................17
        
           3.2.5. Code 3: A Signaled Number of Frames in the Packet ..18
      3.3. Examples ..................................................21
      3.4. Receiving Malformed Packets ...............................22
   4. Opus Decoder ...................................................23
      4.1. Range Decoder .............................................23
           4.1.1. Range Decoder Initialization .......................25
           4.1.2. Decoding Symbols ...................................25
           4.1.3. Alternate Decoding Methods .........................27
           4.1.4. Decoding Raw Bits ..................................29
           4.1.5. Decoding Uniformly Distributed Integers ............29
           4.1.6. Current Bit Usage ..................................30
      4.2. SILK Decoder ..............................................32
           4.2.1. SILK Decoder Modules ...............................32
           4.2.2. LP Layer Organization ..............................33
           4.2.3. Header Bits ........................................35
           4.2.4. Per-Frame LBRR Flags ...............................36
           4.2.5. LBRR Frames ........................................36
           4.2.6. Regular SILK Frames ................................37
           4.2.7. SILK Frame Contents ................................37
                  4.2.7.1. Stereo Prediction Weights .................40
                  4.2.7.2. Mid-Only Flag .............................42
                  4.2.7.3. Frame Type ................................43
                  4.2.7.4. Subframe Gains ............................44
                  4.2.7.5. Normalized Line Spectral Frequency
                           (LSF) and Linear Predictive Coding (LPC)
                           Coeffieients ..............................46
                  4.2.7.6. Long-Term Prediction (LTP) Parameters .....74
                  4.2.7.7. Linear Congruential Generator (LCG) Seed ..86
                  4.2.7.8. Excitation ................................86
                  4.2.7.9. SILK Frame Reconstruction .................98
           4.2.8. Stereo Unmixing ...................................102
           4.2.9. Resampling ........................................103
      4.3. CELT Decoder .............................................104
           4.3.1. Transient Decoding ................................108
           4.3.2. Energy Envelope Decoding ..........................108
           4.3.3. Bit Allocation ....................................110
           4.3.4. Shape Decoding ....................................116
           4.3.5. Anti-collapse Processing ..........................120
           4.3.6. Denormalization ...................................121
           4.3.7. Inverse MDCT ......................................121
      4.4. Packet Loss Concealment (PLC) ............................122
           4.4.1. Clock Drift Compensation ..........................122
      4.5. Configuration Switching ..................................123
           4.5.1. Transition Side Information (Redundancy) ..........124
           4.5.2. State Reset .......................................127
           4.5.3. Summary of Transitions ............................128
   5. Opus Encoder ..................................................131
      5.1. Range Encoder ............................................132
        
           3.2.5. Code 3: A Signaled Number of Frames in the Packet ..18
      3.3. Examples ..................................................21
      3.4. Receiving Malformed Packets ...............................22
   4. Opus Decoder ...................................................23
      4.1. Range Decoder .............................................23
           4.1.1. Range Decoder Initialization .......................25
           4.1.2. Decoding Symbols ...................................25
           4.1.3. Alternate Decoding Methods .........................27
           4.1.4. Decoding Raw Bits ..................................29
           4.1.5. Decoding Uniformly Distributed Integers ............29
           4.1.6. Current Bit Usage ..................................30
      4.2. SILK Decoder ..............................................32
           4.2.1. SILK Decoder Modules ...............................32
           4.2.2. LP Layer Organization ..............................33
           4.2.3. Header Bits ........................................35
           4.2.4. Per-Frame LBRR Flags ...............................36
           4.2.5. LBRR Frames ........................................36
           4.2.6. Regular SILK Frames ................................37
           4.2.7. SILK Frame Contents ................................37
                  4.2.7.1. Stereo Prediction Weights .................40
                  4.2.7.2. Mid-Only Flag .............................42
                  4.2.7.3. Frame Type ................................43
                  4.2.7.4. Subframe Gains ............................44
                  4.2.7.5. Normalized Line Spectral Frequency
                           (LSF) and Linear Predictive Coding (LPC)
                           Coeffieients ..............................46
                  4.2.7.6. Long-Term Prediction (LTP) Parameters .....74
                  4.2.7.7. Linear Congruential Generator (LCG) Seed ..86
                  4.2.7.8. Excitation ................................86
                  4.2.7.9. SILK Frame Reconstruction .................98
           4.2.8. Stereo Unmixing ...................................102
           4.2.9. Resampling ........................................103
      4.3. CELT Decoder .............................................104
           4.3.1. Transient Decoding ................................108
           4.3.2. Energy Envelope Decoding ..........................108
           4.3.3. Bit Allocation ....................................110
           4.3.4. Shape Decoding ....................................116
           4.3.5. Anti-collapse Processing ..........................120
           4.3.6. Denormalization ...................................121
           4.3.7. Inverse MDCT ......................................121
      4.4. Packet Loss Concealment (PLC) ............................122
           4.4.1. Clock Drift Compensation ..........................122
      4.5. Configuration Switching ..................................123
           4.5.1. Transition Side Information (Redundancy) ..........124
           4.5.2. State Reset .......................................127
           4.5.3. Summary of Transitions ............................128
   5. Opus Encoder ..................................................131
      5.1. Range Encoder ............................................132
        
           5.1.1. Encoding Symbols ..................................133
           5.1.2. Alternate Encoding Methods ........................134
           5.1.3. Encoding Raw Bits .................................135
           5.1.4. Encoding Uniformly Distributed Integers ...........135
           5.1.5. Finalizing the Stream .............................135
           5.1.6. Current Bit Usage .................................136
      5.2. SILK Encoder .............................................136
           5.2.1. Sample Rate Conversion ............................137
           5.2.2. Stereo Mixing .....................................137
           5.2.3. SILK Core Encoder .................................138
      5.3. CELT Encoder .............................................150
           5.3.1. Pitch Pre-filter ..................................150
           5.3.2. Bands and Normalization ...........................151
           5.3.3. Energy Envelope Quantization ......................151
           5.3.4. Bit Allocation ....................................151
           5.3.5. Stereo Decisions ..................................152
           5.3.6. Time-Frequency Decision ...........................153
           5.3.7. Spreading Values Decision .........................153
           5.3.8. Spherical Vector Quantization .....................154
   6. Conformance ...................................................155
      6.1. Testing ..................................................155
      6.2. Opus Custom ..............................................156
   7. Security Considerations .......................................157
   8. Acknowledgements ..............................................158
   9. References ....................................................159
      9.1. Normative References .....................................159
      9.2. Informative References ...................................159
   Appendix A. Reference Implementation .............................163
      A.1. Extracting the Source ....................................164
      A.2. Up-to-Date Implementation ................................164
      A.3. Base64-Encoded Source Code ...............................164
      A.4. Test Vectors .............................................321
   Appendix B. Self-Delimiting Framing ..............................321
        
           5.1.1. Encoding Symbols ..................................133
           5.1.2. Alternate Encoding Methods ........................134
           5.1.3. Encoding Raw Bits .................................135
           5.1.4. Encoding Uniformly Distributed Integers ...........135
           5.1.5. Finalizing the Stream .............................135
           5.1.6. Current Bit Usage .................................136
      5.2. SILK Encoder .............................................136
           5.2.1. Sample Rate Conversion ............................137
           5.2.2. Stereo Mixing .....................................137
           5.2.3. SILK Core Encoder .................................138
      5.3. CELT Encoder .............................................150
           5.3.1. Pitch Pre-filter ..................................150
           5.3.2. Bands and Normalization ...........................151
           5.3.3. Energy Envelope Quantization ......................151
           5.3.4. Bit Allocation ....................................151
           5.3.5. Stereo Decisions ..................................152
           5.3.6. Time-Frequency Decision ...........................153
           5.3.7. Spreading Values Decision .........................153
           5.3.8. Spherical Vector Quantization .....................154
   6. Conformance ...................................................155
      6.1. Testing ..................................................155
      6.2. Opus Custom ..............................................156
   7. Security Considerations .......................................157
   8. Acknowledgements ..............................................158
   9. References ....................................................159
      9.1. Normative References .....................................159
      9.2. Informative References ...................................159
   Appendix A. Reference Implementation .............................163
      A.1. Extracting the Source ....................................164
      A.2. Up-to-Date Implementation ................................164
      A.3. Base64-Encoded Source Code ...............................164
      A.4. Test Vectors .............................................321
   Appendix B. Self-Delimiting Framing ..............................321
        
1. Introduction
1. 介绍

The Opus codec is a real-time interactive audio codec designed to meet the requirements described in [REQUIREMENTS]. It is composed of a layer based on Linear Prediction (LP) [LPC] and a layer based on the Modified Discrete Cosine Transform (MDCT) [MDCT]. The main idea behind using two layers is as follows: in speech, linear prediction techniques (such as Code-Excited Linear Prediction, or CELP) code low frequencies more efficiently than transform (e.g., MDCT) domain techniques, while the situation is reversed for music and higher speech frequencies. Thus, a codec with both layers available can operate over a wider range than either one alone and can achieve better quality by combining them than by using either one individually.

Opus编解码器是一种实时交互式音频编解码器,旨在满足[要求]中所述的要求。它由一个基于线性预测(LP)[LPC]的层和一个基于改进的离散余弦变换(MDCT)[MDCT]的层组成。使用两层的主要思想如下:在语音中,线性预测技术(如码激励线性预测,或CELP)比变换(如MDCT)域技术更有效地编码低频,而音乐和更高的语音频率则相反。因此,具有两个可用层的编解码器可以在比单独使用其中一个更宽的范围内运行,并且通过组合它们可以实现比单独使用其中一个更好的质量。

The primary normative part of this specification is provided by the source code in Appendix A. Only the decoder portion of this software is normative, though a significant amount of code is shared by both the encoder and decoder. Section 6 provides a decoder conformance test. The decoder contains a great deal of integer and fixed-point arithmetic that needs to be performed exactly, including all rounding considerations, so any useful specification requires domain-specific symbolic language to adequately define these operations. Additionally, any conflict between the symbolic representation and the included reference implementation must be resolved. For the practical reasons of compatibility and testability, it would be advantageous to give the reference implementation priority in any disagreement. The C language is also one of the most widely understood, human-readable symbolic representations for machine behavior. For these reasons, this RFC uses the reference implementation as the sole symbolic representation of the codec.

本规范的主要规范性部分由附录A中的源代码提供。尽管编码器和解码器共享大量代码,但本软件的解码器部分是规范性的。第6节提供了解码器一致性测试。解码器包含大量需要精确执行的整数和定点算法,包括所有舍入考虑,因此任何有用的规范都需要特定于域的符号语言来充分定义这些操作。此外,必须解决符号表示和包含的引用实现之间的任何冲突。出于兼容性和可测试性的实际原因,在任何不一致的情况下,给予参考实现优先权都是有利的。C语言也是最广泛理解的、人类可读的机器行为符号表示之一。由于这些原因,此RFC使用参考实现作为编解码器的唯一符号表示。

While the symbolic representation is unambiguous and complete, it is not always the easiest way to understand the codec's operation. For this reason, this document also describes significant parts of the codec in prose and takes the opportunity to explain the rationale behind many of the more surprising elements of the design. These descriptions are intended to be accurate and informative, but the limitations of common English sometimes result in ambiguity, so it is expected that the reader will always read them alongside the symbolic representation. Numerous references to the implementation are provided for this purpose. The descriptions sometimes differ from the reference in ordering or through mathematical simplification wherever such deviation makes an explanation easier to understand. For example, the right shift and left shift operations in the reference implementation are often described using division and

虽然符号表示是明确和完整的,但它并不总是理解编解码器操作的最简单方法。因此,本文档还以散文的形式描述了编解码器的重要部分,并借此机会解释了设计中许多更令人惊讶的元素背后的基本原理。这些描述旨在准确且信息丰富,但普通英语的局限性有时会导致歧义,因此预计读者将始终在阅读符号表示的同时阅读这些描述。为此目的,提供了大量实施参考。这些描述有时在顺序上或通过数学简化与参考不同,因为这种偏差使得解释更容易理解。例如,参考实现中的右移和左移操作通常使用除法和除法来描述

multiplication in the text. In general, the text is focused on the "what" and "why" while the symbolic representation most clearly provides the "how".

文本中的乘法。一般来说,文本集中在“什么”和“为什么”上,而符号表示最清楚地提供了“如何”。

1.1. Notation and Conventions
1.1. 符号和约定

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照RFC 2119[RFC2119]中所述进行解释。

Various operations in the codec require bit-exact fixed-point behavior, even when writing a floating point implementation. The notation "Q<n>", where n is an integer, denotes the number of binary digits to the right of the decimal point in a fixed-point number. For example, a signed Q14 value in a 16-bit word can represent values from -2.0 to 1.99993896484375, inclusive. This notation is for informational purposes only. Arithmetic, when described, always operates on the underlying integer. For example, the text will explicitly indicate any shifts required after a multiplication.

编解码器中的各种操作需要位精确的定点行为,即使在编写浮点实现时也是如此。符号“Q<n>”,其中n是整数,表示定点数字中小数点右侧的二进制位数。例如,16位字中的有符号Q14值可以表示从-2.0到1.999938964875(包括-2.0)的值。此符号仅供参考。当描述算术时,总是对基础整数进行运算。例如,文本将明确指示乘法后所需的任何移位。

Expressions, where included in the text, follow C operator rules and precedence, with the exception that the syntax "x**y" indicates x raised to the power y. The text also makes use of the following functions.

文本中包含的表达式遵循C运算符规则和优先级,但语法“x**y”表示x升为y的幂。本文还使用了以下功能。

1.1.1. min(x,y)
1.1.1. 最小(x,y)

The smallest of two values x and y.

两个值x和y中最小的一个。

1.1.2. max(x,y)
1.1.2. 最大值(x,y)

The largest of two values x and y.

两个值x和y中的最大值。

1.1.3. clamp(lo,x,hi)
1.1.3. 夹钳(低、x、高)
                     clamp(lo,x,hi) = max(lo,min(x,hi))
        
                     clamp(lo,x,hi) = max(lo,min(x,hi))
        

With this definition, if lo > hi, then lo is returned.

根据此定义,如果lo>hi,则返回lo。

1.1.4. sign(x)
1.1.4. 符号(x)

The sign of x, i.e.,

x的符号,即。,

                                    ( -1,  x < 0
                          sign(x) = <  0,  x == 0
                                    (  1,  x > 0
        
                                    ( -1,  x < 0
                          sign(x) = <  0,  x == 0
                                    (  1,  x > 0
        
1.1.5. abs(x)
1.1.5. abs(x)

The absolute value of x, i.e.,

x的绝对值,即:。,

                             abs(x) = sign(x)*x
        
                             abs(x) = sign(x)*x
        
1.1.6. floor(f)
1.1.6. 楼层(f)

The largest integer z such that z <= f.

最大整数z,使得z<=f。

1.1.7. ceil(f)
1.1.7. ceil(f)

The smallest integer z such that z >= f.

最小的整数z,使得z>=f。

1.1.8. round(f)
1.1.8. 第四轮(f)

The integer z nearest to f, with ties rounded towards negative infinity, i.e.,

最接近f的整数z,其结向负无穷大舍入,即。,

                           round(f) = ceil(f - 0.5)
        
                           round(f) = ceil(f - 0.5)
        
1.1.9. log2(f)
1.1.9. 日志2(f)

The base-two logarithm of f.

f的底对数。

1.1.10. ilog(n)
1.1.10. 伊洛格(北)

The minimum number of bits required to store a positive integer n in binary, or 0 for a non-positive integer n.

二进制存储正整数n所需的最小位数,非正整数n为0。

                              ( 0,                 n <= 0
                    ilog(n) = <
                              ( floor(log2(n))+1,  n > 0
        
                              ( 0,                 n <= 0
                    ilog(n) = <
                              ( floor(log2(n))+1,  n > 0
        

Examples:

示例:

o ilog(-1) = 0

o ilog(-1)=0

o ilog(0) = 0

o ilog(0)=0

o ilog(1) = 1

o ilog(1)=1

o ilog(2) = 2

o ilog(2)=2

o ilog(3) = 2

o ilog(3)=2

o ilog(4) = 3

o ilog(4)=3

o ilog(7) = 3

o ilog(7)=3

2. Opus Codec Overview
2. Opus编解码器概述

The Opus codec scales from 6 kbit/s narrowband mono speech to 510 kbit/s fullband stereo music, with algorithmic delays ranging from 5 ms to 65.2 ms. At any given time, either the LP layer, the MDCT layer, or both, may be active. It can seamlessly switch between all of its various operating modes, giving it a great deal of flexibility to adapt to varying content and network conditions without renegotiating the current session. The codec allows input and output of various audio bandwidths, defined as follows:

Opus编解码器从6 kbit/s窄带单声道语音扩展到510 kbit/s全波段立体声音乐,算法延迟从5 ms到65.2 ms。在任何给定时间,LP层、MDCT层或两者都可能处于活动状态。它可以在所有不同的操作模式之间无缝切换,使其具有很大的灵活性以适应不同的内容和网络条件,而无需重新协商当前会话。编解码器允许输入和输出各种音频带宽,定义如下:

   +----------------------+-----------------+-------------------------+
   | Abbreviation         | Audio Bandwidth | Sample Rate (Effective) |
   +----------------------+-----------------+-------------------------+
   | NB (narrowband)      |           4 kHz |                   8 kHz |
   |                      |                 |                         |
   | MB (medium-band)     |           6 kHz |                  12 kHz |
   |                      |                 |                         |
   | WB (wideband)        |           8 kHz |                  16 kHz |
   |                      |                 |                         |
   | SWB (super-wideband) |          12 kHz |                  24 kHz |
   |                      |                 |                         |
   | FB (fullband)        |      20 kHz (*) |                  48 kHz |
   +----------------------+-----------------+-------------------------+
        
   +----------------------+-----------------+-------------------------+
   | Abbreviation         | Audio Bandwidth | Sample Rate (Effective) |
   +----------------------+-----------------+-------------------------+
   | NB (narrowband)      |           4 kHz |                   8 kHz |
   |                      |                 |                         |
   | MB (medium-band)     |           6 kHz |                  12 kHz |
   |                      |                 |                         |
   | WB (wideband)        |           8 kHz |                  16 kHz |
   |                      |                 |                         |
   | SWB (super-wideband) |          12 kHz |                  24 kHz |
   |                      |                 |                         |
   | FB (fullband)        |      20 kHz (*) |                  48 kHz |
   +----------------------+-----------------+-------------------------+
        

Table 1

表1

(*) Although the sampling theorem allows a bandwidth as large as half the sampling rate, Opus never codes audio above 20 kHz, as that is the generally accepted upper limit of human hearing.

(*)尽管采样定理允许带宽高达采样率的一半,但Opus从未对20 kHz以上的音频进行编码,因为这是公认的人类听力上限。

Opus defines super-wideband (SWB) with an effective sample rate of 24 kHz, unlike some other audio coding standards that use 32 kHz. This was chosen for a number of reasons. The band layout in the MDCT layer naturally allows skipping coefficients for frequencies over 12 kHz, but does not allow cleanly dropping just those frequencies over 16 kHz. A sample rate of 24 kHz also makes resampling in the MDCT layer easier, as 24 evenly divides 48, and when 24 kHz is sufficient, it can save computation in other processing, such as Acoustic Echo Cancellation (AEC). Experimental changes to the band layout to allow a 16 kHz cutoff (32 kHz effective sample rate) showed potential quality degradations at other sample rates, and, at typical bitrates, the number of bits saved by using such a cutoff instead of coding in fullband (FB) mode is very small. Therefore, if an application wishes to process a signal sampled at 32 kHz, it should just use FB.

Opus定义了有效采样率为24 kHz的超宽带(SWB),与其他一些使用32 kHz的音频编码标准不同。选择这一点有很多原因。MDCT层中的频带布局自然允许跳过12 kHz以上频率的系数,但不允许仅将那些超过16 kHz的频率完全丢弃。24 kHz的采样率也使得MDCT层中的重采样更容易,因为24均匀地除以48,并且当24 kHz足够时,它可以节省其他处理中的计算,例如声学回波消除(AEC)。对频带布局进行实验性更改,以允许16 kHz截止(32 kHz有效采样率)显示在其他采样率下存在潜在的质量下降,并且在典型比特率下,使用这种截止而不是在全频带(FB)模式下编码所节省的比特数非常少。因此,如果应用程序希望处理以32 kHz采样的信号,则应仅使用FB。

The LP layer is based on the SILK codec [SILK]. It supports NB, MB, or WB audio and frame sizes from 10 ms to 60 ms, and requires an additional 5 ms look-ahead for noise shaping estimation. A small additional delay (up to 1.5 ms) may be required for sampling rate conversion. Like Vorbis [VORBIS-WEBSITE] and many other modern codecs, SILK is inherently designed for variable bitrate (VBR) coding, though the encoder can also produce constant bitrate (CBR) streams. The version of SILK used in Opus is substantially modified from, and not compatible with, the stand-alone SILK codec previously deployed by Skype. This document does not serve to define that format, but those interested in the original SILK codec should see [SILK] instead.

LP层基于丝绸编解码器[SILK]。它支持10毫秒到60毫秒的NB、MB或WB音频和帧大小,并且需要额外的5毫秒前瞻以进行噪声整形估计。采样率转换可能需要额外的小延迟(高达1.5 ms)。与Vorbis[Vorbis-WEBSITE]和许多其他现代编解码器一样,SILK天生就是为可变比特率(VBR)编码而设计的,尽管编码器也可以产生恒定比特率(CBR)流。Opus中使用的SILK版本实质上是从Skype先前部署的独立SILK编解码器修改而来的,与之不兼容。本文档不用于定义该格式,但对原始SILK编解码器感兴趣的人应该查看[SILK]。

The MDCT layer is based on the Constrained-Energy Lapped Transform (CELT) codec [CELT]. It supports NB, WB, SWB, or FB audio and frame sizes from 2.5 ms to 20 ms, and requires an additional 2.5 ms look-ahead due to the overlapping MDCT windows. The CELT codec is inherently designed for CBR coding, but unlike many CBR codecs, it is not limited to a set of predetermined rates. It internally allocates bits to exactly fill any given target budget, and an encoder can produce a VBR stream by varying the target on a per-frame basis. The MDCT layer is not used for speech when the audio bandwidth is WB or less, as it is not useful there. On the other hand, non-speech signals are not always adequately coded using linear prediction. Therefore, the MDCT layer should be used for music signals.

MDCT层基于受限能量重叠变换(CELT)编解码器[CELT]。它支持NB、WB、SWB或FB音频和帧大小从2.5 ms到20 ms,并且由于MDCT窗口重叠,需要额外的2.5 ms前瞻。CELT编解码器本质上是为CBR编码而设计的,但与许多CBR编解码器不同,它不限于一组预定速率。它在内部分配位来精确地填充任何给定的目标预算,编码器可以通过每帧改变目标来生成VBR流。当音频带宽为WB或更低时,MDCT层不用于语音,因为它在那里没有用处。另一方面,非语音信号并不总是使用线性预测进行充分编码。因此,MDCT层应用于音乐信号。

A "Hybrid" mode allows the use of both layers simultaneously with a frame size of 10 or 20 ms and an SWB or FB audio bandwidth. The LP layer codes the low frequencies by resampling the signal down to WB. The MDCT layer follows, coding the high frequency portion of the signal. The cutoff between the two lies at 8 kHz, the maximum WB audio bandwidth. In the MDCT layer, all bands below 8 kHz are discarded, so there is no coding redundancy between the two layers.

“混合”模式允许以10或20 ms的帧大小和SWB或FB音频带宽同时使用两层。LP层通过将信号重采样到WB来对低频进行编码。随后是MDCT层,对信号的高频部分进行编码。两者之间的截止频率为8 kHz,即最大WB音频带宽。在MDCT层中,所有低于8 kHz的频带都被丢弃,因此两层之间没有编码冗余。

The sample rate (in contrast to the actual audio bandwidth) can be chosen independently on the encoder and decoder side, e.g., a fullband signal can be decoded as wideband, or vice versa. This approach ensures a sender and receiver can always interoperate, regardless of the capabilities of their actual audio hardware. Internally, the LP layer always operates at a sample rate of twice the audio bandwidth, up to a maximum of 16 kHz, which it continues to use for SWB and FB. The decoder simply resamples its output to support different sample rates. The MDCT layer always operates internally at a sample rate of 48 kHz. Since all the supported sample rates evenly divide this rate, and since the decoder may easily zero out the high frequency portion of the spectrum in the frequency domain, it can simply decimate the MDCT layer output to achieve the other supported sample rates very cheaply.

采样率(与实际音频带宽相反)可在编码器和解码器侧独立选择,例如,全频带信号可解码为宽带,反之亦然。这种方法确保发送方和接收方始终可以互操作,而不管其实际音频硬件的功能如何。在内部,LP层始终以音频带宽两倍的采样率运行,最高可达16 kHz,它将继续用于SWB和FB。解码器只需重新采样其输出,以支持不同的采样率。MDCT层始终以48 kHz的采样率在内部运行。由于所有支持的采样率均匀地划分该速率,并且由于解码器可以容易地将频域中频谱的高频部分归零,因此它可以简单地抽取MDCT层输出,以非常便宜地实现其他支持的采样率。

After conversion to the common, desired output sample rate, the decoder simply adds the output from the two layers together. To compensate for the different look-ahead required by each layer, the CELT encoder input is delayed by an additional 2.7 ms. This ensures that low frequencies and high frequencies arrive at the same time. This extra delay may be reduced by an encoder by using less look-ahead for noise shaping or using a simpler resampler in the LP layer, but this will reduce quality. However, the base 2.5 ms look-ahead in the CELT layer cannot be reduced in the encoder because it is needed for the MDCT overlap, whose size is fixed by the decoder.

在转换到公共的、期望的输出采样率之后,解码器将两层的输出简单地相加。为了补偿各层所需的不同前瞻性,CELT编码器输入额外延迟2.7毫秒。这确保低频和高频同时到达。编码器可以通过使用较少的噪声整形前瞻或在LP层中使用更简单的重采样器来减少额外延迟,但这会降