Network Working Group                                       G. Camarillo
Request for Comments: 3960                                      Ericsson
Category: Informational                                   H. Schulzrinne
                                                     Columbia University
                                                           December 2004
        
Network Working Group                                       G. Camarillo
Request for Comments: 3960                                      Ericsson
Category: Informational                                   H. Schulzrinne
                                                     Columbia University
                                                           December 2004
        

Early Media and Ringing Tone Generation in the Session Initiation Protocol (SIP)

会话启动协议(SIP)中的早期媒体和铃声生成

Status of This Memo

关于下段备忘

This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.

本备忘录为互联网社区提供信息。它没有规定任何类型的互联网标准。本备忘录的分发不受限制。

Copyright Notice

版权公告

Copyright (C) The Internet Society (2004).

版权所有(C)互联网协会(2004年)。

Abstract

摘要

This document describes how to manage early media in the Session Initiation Protocol (SIP) using two models: the gateway model and the application server model. It also describes the inputs one needs to consider in defining local policies for ringing tone generation.

本文档描述了如何使用两种模型管理会话启动协议(SIP)中的早期媒体:网关模型和应用程序服务器模型。它还描述了在定义铃声生成的本地策略时需要考虑的输入。

Table of Contents

目录

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  2
   2.  Session Establishment in SIP . . . . . . . . . . . . . . . . .  3
   3.  The Gateway Model. . . . . . . . . . . . . . . . . . . . . . .  4
       3.1.  Forking. . . . . . . . . . . . . . . . . . . . . . . . .  4
       3.2.  Ringing Tone Generation. . . . . . . . . . . . . . . . .  5
       3.3.  Absence of an Early Media Indicator. . . . . . . . . . .  7
       3.4.  Applicability of the Gateway Model . . . . . . . . . . .  8
   4.  The Application Server Model . . . . . . . . . . . . . . . . .  8
       4.1.  In-Band Versus Out-of-Band Session Progress Information.  9
   5.  Alert-Info Header Field. . . . . . . . . . . . . . . . . . . .  9
   6.  Security Considerations. . . . . . . . . . . . . . . . . . . .  9
   7.  Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . 10
   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
       8.1.  Normative References . . . . . . . . . . . . . . . . . . 11
       8.2.  Informative References . . . . . . . . . . . . . . . . . 11
       Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 12
       Full Copyright Statement . . . . . . . . . . . . . . . . . . . 13
        
   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  2
   2.  Session Establishment in SIP . . . . . . . . . . . . . . . . .  3
   3.  The Gateway Model. . . . . . . . . . . . . . . . . . . . . . .  4
       3.1.  Forking. . . . . . . . . . . . . . . . . . . . . . . . .  4
       3.2.  Ringing Tone Generation. . . . . . . . . . . . . . . . .  5
       3.3.  Absence of an Early Media Indicator. . . . . . . . . . .  7
       3.4.  Applicability of the Gateway Model . . . . . . . . . . .  8
   4.  The Application Server Model . . . . . . . . . . . . . . . . .  8
       4.1.  In-Band Versus Out-of-Band Session Progress Information.  9
   5.  Alert-Info Header Field. . . . . . . . . . . . . . . . . . . .  9
   6.  Security Considerations. . . . . . . . . . . . . . . . . . . .  9
   7.  Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . 10
   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
       8.1.  Normative References . . . . . . . . . . . . . . . . . . 11
       8.2.  Informative References . . . . . . . . . . . . . . . . . 11
       Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 12
       Full Copyright Statement . . . . . . . . . . . . . . . . . . . 13
        
1. Introduction
1. 介绍

Early media refers to media (e.g., audio and video) that is exchanged before a particular session is accepted by the called user. Within a dialog, early media occurs from the moment the initial INVITE is sent until the User Agent Server (UAS) generates a final response. It may be unidirectional or bidirectional, and can be generated by the caller, the callee, or both. Typical examples of early media generated by the callee are ringing tone and announcements (e.g., queuing status). Early media generated by the caller typically consists of voice commands or dual tone multi-frequency (DTMF) tones to drive interactive voice response (IVR) systems.

早期媒体是指在被呼叫用户接受特定会话之前交换的媒体(例如音频和视频)。在对话框中,早期媒体从发送初始邀请开始,直到用户代理服务器(UAS)生成最终响应。它可以是单向的或双向的,并且可以由调用者、被调用者或两者生成。被叫方生成的早期媒体的典型示例是铃声和公告(例如排队状态)。呼叫方生成的早期媒体通常由语音命令或双音多频(DTMF)音调组成,以驱动交互式语音应答(IVR)系统。

The basic SIP specification (RFC 3261 [1]) only supports very simple early media mechanisms. These simple mechanisms have a number of problems which relate to forking and security, and do not satisfy the requirements of most applications. This document goes beyond the mechanisms defined in RFC 3261 [1] and describes two models of early media implementations using SIP: the gateway model and the application server model.

基本SIP规范(RFC3261[1])只支持非常简单的早期媒体机制。这些简单的机制存在许多与分叉和安全性相关的问题,不能满足大多数应用程序的要求。本文档超越了RFC 3261[1]中定义的机制,并描述了使用SIP的早期媒体实现的两种模型:网关模型和应用服务器模型。

Although both early media models described in this document are superior to the one specified in RFC 3261 [1], the gateway model still presents a set of issues. In particular, the gateway model does not work well with forking. Nevertheless, the gateway model is needed because some SIP entities (in particular, some gateways) cannot implement the application server model.

尽管本文档中描述的两种早期媒体模型都优于RFC 3261[1]中指定的模型,但网关模型仍然存在一系列问题。特别是,网关模型不能很好地用于分叉。然而,由于某些SIP实体(特别是某些网关)无法实现应用服务器模型,因此需要网关模型。

The application server model addresses some of the issues present in the gateway model. This model uses the early-session disposition type, which is specified in [2].

应用服务器模型解决了网关模型中存在的一些问题。此模型使用[2]中指定的早期会话处置类型。

The remainder of this document is organized as follows: Section 2 describes the offer/answer model in the absence of early media, and Section 3 introduces the gateway model. In this model, the early media session is established using the early dialog established by the original INVITE. Sections 3.1, 3.2, and 3.4 describe the limitations of the gateway model and the scenarios where it is appropriate to use this model. Section 4 introduces the application server model, which, as stated previously, resolves some of the issues present in the gateway model. Section 5 discusses the interactions between the Alert-Info header field in both early media models.

本文档的其余部分组织如下:第2节描述了在没有早期媒体的情况下提供/应答模型,第3节介绍了网关模型。在这个模型中,早期媒体会话是使用原始INVITE建立的早期对话框建立的。第3.1节、第3.2节和第3.4节描述了网关模型的局限性以及适合使用该模型的场景。第4节介绍了应用服务器模型,如前所述,它解决了网关模型中存在的一些问题。第5节讨论了两种早期媒体模型中警报信息标题字段之间的交互。

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", " NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [9].

本文件中的关键词“必须”、“不得”、“要求”、“应”、“不得”、“应”、“不应”、“建议”、“可”和“可选”应按照[9]中所述进行解释。

2. Session Establishment in SIP
2. SIP中的会话建立

Before presenting both early media models, we will briefly summarize how session establishment works in SIP. This will let us keep separate features that are intrinsic to SIP (e.g., media being played before the 200 (OK) to avoid media clipping) from early media operations.

在介绍这两种早期媒体模型之前,我们将简要总结会话建立在SIP中的工作原理。这将使我们能够将SIP固有的功能(例如,在200(OK)之前播放的媒体,以避免媒体剪辑)与早期媒体操作分开。

SIP [1] uses the offer/answer model [3] to negotiate session parameters. One of the user agents - the offerer - prepares a session description that is called the offer. The other user agent - the answerer - responds with another session description called the answer. This two-way handshake allows both user agents to agree upon the session parameters to be used to exchange media.

SIP[1]使用提供/应答模型[3]协商会话参数。其中一个用户代理—报价人—准备一个称为报价的会话描述。另一个用户代理—应答者—用另一个称为“应答”的会话描述进行响应。这种双向握手允许两个用户代理商定用于交换媒体的会话参数。

The offer/answer model decouples the offer/answer exchange from the messages used to transport the session descriptions. For example, the offer can be sent in an INVITE request and the answer can arrive in the 200 (OK) response for that INVITE, or, alternatively, the offer can be sent in the 200 (OK) for an empty INVITE and the answer can be sent in the ACK. When reliable provisional responses [4] and UPDATE requests [5] are used, there are many more possible ways to exchange offers and answers.

提供/应答模型将提供/应答交换与用于传输会话描述的消息分离。例如,要约可以在邀请请求中发送,答案可以到达该邀请的200(确定)响应中,或者,要约可以在空邀请的200(确定)响应中发送,答案可以在ACK中发送。当使用可靠的临时响应[4]和更新请求[5]时,有更多可能的方式来交换报价和答案。

Media clipping occurs when the user (or the machine generating media) believes that the media session is already established, but the establishment process has not finished yet. The user starts speaking (i.e., generating media) and the first few syllables or even the first few words are lost.

当用户(或生成媒体的机器)认为媒体会话已经建立,但建立过程尚未完成时,就会发生媒体剪辑。用户开始说话(即生成媒体),前几个音节甚至前几个单词丢失。

When the offer/answer exchange takes place in the 200 (OK) response and in the ACK, media clipping is unavoidable. The called user starts speaking at the same time the 200 (OK) is sent, but the UAS cannot send any media until the answer from the User Agent Client (UAC) arrives in the ACK.

当在200(OK)应答和ACK中进行提供/应答交换时,媒体剪辑是不可避免的。被叫用户在发送200(OK)的同时开始讲话,但在用户代理客户端(UAC)的应答到达ACK之前,UAS无法发送任何媒体。

On the other hand, media clipping does not appear in the most common offer/answer exchange (an INVITE with an offer and a 200 (OK) with an answer). UACs are ready to play incoming media packets as soon as they send an offer, because they cannot count on the reception of the 200 (OK) to start playing out media for the caller; SIP signalling and media packets typically traverse different paths, and so, media packets may arrive before the 200 (OK) response.

另一方面,媒体剪辑不会出现在最常见的报价/应答交换中(带报价的邀请和带应答的200(确定))。UAC在发送报价后就准备好播放传入的媒体包,因为他们不能指望收到200(OK)来开始为呼叫者播放媒体;SIP信令和媒体分组通常穿过不同的路径,因此,媒体分组可能在200(OK)响应之前到达。

Another form of media clipping (not related to early media either) occurs in the caller-to-callee direction. When the callee picks up and starts speaking, the UAS sends a 200 (OK) response with an answer, in parallel with the first media packets. If the first media

另一种形式的媒体剪辑(也与早期媒体无关)发生在调用者到被调用者的方向。当被叫方接听并开始讲话时,UAS发送一个200(OK)响应和一个应答,与第一个媒体数据包并行。如果第一媒体

packets arrive at the UAC before the answer and the caller starts speaking, the UAC cannot send media until the 200 (OK) response from the UAS arrives.

数据包在应答和呼叫者开始讲话之前到达UAC,UAC无法发送媒体,直到UAS的200(OK)响应到达。

3. The Gateway Model
3. 网关模型

SIP uses the offer/answer model to negotiate session parameters (as described in Section 2). An offer/answer exchange that takes place before a final response for the INVITE is sent establishes an "early" media session. Early media sessions terminate when a final response for the INVITE is sent. If the final response is a 200 (OK), the early media session transitions to a regular media session. If the final response is a non-200 class final response, the early media session is simply terminated.

SIP使用提供/应答模型协商会话参数(如第2节所述)。在发送邀请的最终回复之前进行的要约/应答交换建立了“早期”媒体会话。当发送邀请的最终响应时,早期媒体会话终止。如果最终响应为200(确定),则早期媒体会话将转换为常规媒体会话。如果最终响应是非200类最终响应,则早期媒体会话将简单地终止。

Not surprisingly, media exchanged within an early media session is referred to as early media. The gateway model consists of managing early media sessions using offer/answer exchanges in reliable provisional responses, PRACKs, and UPDATEs.

毫不奇怪,在早期媒体会话中交换的媒体称为早期媒体。网关模型包括使用可靠的临时响应、恶作剧和更新中的提供/应答交换来管理早期媒体会话。

The gateway model is seriously limited in the presence of forking, as described in Section 3.1. Therefore, its use is only acceptable when the User Agent (UA) cannot distinguish between early and regular media, as described in Section 3.4. In any other situation (the majority of UAs), use of the application server model described in Section 4 is strongly recommended instead.

如第3.1节所述,在存在分叉的情况下,网关模型受到严重限制。因此,如第3.4节所述,只有当用户代理(UA)无法区分早期媒体和常规媒体时,才可以使用它。在任何其他情况下(大多数UAs),强烈建议使用第4节中描述的应用程序服务器模型。

3.1. Forking
3.1. 分叉

In the absence of forking, assuming that the initial INVITE contains an offer, the gateway model does not introduce media clipping. Following normal SIP procedures, the UAC is ready to play any incoming media as soon as it sends the initial offer in the INVITE. The UAS sends the answer in a reliable provisional response and can send media as soon as there is media to send. Even if the first media packets arrive at the UAC before the 1xx response, the UAC will play them.

在没有分叉的情况下,假设初始邀请包含要约,网关模型不会引入媒体剪辑。按照正常的SIP程序,UAC准备在INVITE中发送初始报价后立即播放任何传入媒体。UAS以可靠的临时响应发送应答,并且可以在有媒体要发送时立即发送媒体。即使第一个媒体数据包在1xx响应之前到达UAC,UAC也会播放它们。

Note that, in some situations, the UAC needs to receive the answer before being able to play any media. UAs in such a situation (e.g., QoS, media authorization, or media encryption is required) use preconditions to avoid media clipping.

请注意,在某些情况下,UAC需要在能够播放任何媒体之前收到答案。UAs在这种情况下(例如,需要QoS、媒体授权或媒体加密)使用先决条件来避免媒体剪辑。

On the other hand, if the INVITE forks, the gateway model may introduce media clipping. This happens when the UAC receives different answers to its offer in several provisional responses from different UASs. The UAC has to deal with bandwidth limitations and early media session selection.

另一方面,如果INVITE分叉,网关模型可能会引入媒体剪辑。当UAC收到来自不同UAS的几个临时响应中对其报价的不同回答时,就会发生这种情况。UAC必须处理带宽限制和早期媒体会话选择。

If the UAC receives early media from different UASs, it needs to present it to the user. If the early media consists of audio, playing several audio streams to the user at the same time may be confusing. On the other hand, other media types (e.g., video) can be presented to the user at the same time. For example, the UAC can build a mosaic with the different inputs.

如果UAC接收到来自不同UAS的早期媒体,则需要将其呈现给用户。如果早期媒体由音频组成,则同时向用户播放多个音频流可能会令人困惑。另一方面,可以同时向用户呈现其他媒体类型(例如,视频)。例如,UAC可以用不同的输入构建马赛克。

However, even with media types that can be played at the same time to the user, if the UAC has limited bandwidth, it will not be able to receive early media from all the different UASs at the same time. Therefore, many times, the UAC needs to choose a single early media session and "mute" those sending UPDATE requests.

但是,即使使用可以同时向用户播放的媒体类型,如果UAC的带宽有限,它也无法同时从所有不同的UAS接收早期媒体。因此,很多时候,UAC需要选择一个早期媒体会话并“静音”发送更新请求的会话。

It is difficult to decide which early media sessions carry more important information from the caller's perspective. In fact, in some scenarios, the UA cannot even correlate media packets with their particular SIP early dialog. Therefore, UACs typically pick one early dialog randomly and mute the rest.

从打电话者的角度来看,很难决定哪一个早期的媒体会议包含了更重要的信息。事实上,在某些场景中,UA甚至无法将媒体分组与其特定的SIP早期对话相关联。因此,UAC通常会随机选择一个早期对话框,并使其余的对话框静音。

If one of the early media sessions that was muted transitions to a regular media session (i.e., the UAS sends a 2xx response), media clipping is likely. The UAC typically sends an UPDATE with a new offer (upon reception of the 200 (OK) for the INVITE) to unmute the media session. The UAS cannot send any media until it receives the offer from the UAC. Therefore, if the caller starts speaking before the offer from the UAC is received, his words will get lost.

如果一个被静音的早期媒体会话转换为常规媒体会话(即UAS发送2xx响应),则可能会进行媒体剪辑。UAC通常会发送一个带有新提议的更新(在收到邀请的200(确定))以取消媒体会话静音。在收到UAC的报价之前,UAS无法发送任何媒体。因此,如果来电者在收到UAC的报价之前就开始讲话,他的话就会丢失。

Having the UAS send the UPDATE to unmute the media session (instead of the UAC) does not avoid media clipping in the backward direction and it causes possible race conditions.

让UAS发送更新以取消静音媒体会话(而不是UAC)不会避免媒体向后剪切,并且可能会导致竞争情况。

3.2. Ringing Tone Generation
3.2. 铃声生成

In the PSTN, telephone switches typically play ringing tones for the caller, indicating that the callee is being alerted. When, where, and how these ringing tones are generated has been standardized (i.e., the local exchange of the callee generates a standardized ringing tone while the callee is being alerted). It makes sense for a standardized approach to provide this type of feedback for the user in a homogeneous environment such as the PSTN, where all the terminals have a similar user interface.

在PSTN中,电话交换机通常为呼叫者播放铃声,表示被呼叫者正在收到警报。这些铃声产生的时间、地点和方式已经标准化(即,被呼叫方的本地交换机在被呼叫方收到警报时产生标准化铃声)。对于标准化方法来说,在诸如PSTN之类的同质环境中为用户提供这种类型的反馈是有意义的,其中所有终端都具有类似的用户界面。

This homogeneity is not found among SIP user agents. SIP user agents have different capabilities, different user interfaces, and may be used to establish sessions that do not involve audio at all. Because of this, the way a SIP UA provides the user with information about the progress of session establishment is a matter of local policy. For example, a UA with a Graphical User Interface (GUI) may choose to

在SIP用户代理中没有发现这种同质性。SIP用户代理具有不同的功能、不同的用户界面,可用于建立完全不涉及音频的会话。因此,SIP UA向用户提供会话建立进度信息的方式取决于本地策略。例如,具有图形用户界面(GUI)的UA可以选择

display a message on the screen when the callee is being alerted, while another UA may choose to show a picture of a phone ringing instead. Many SIP UAs choose to imitate the user interface of the PSTN phones. They provide a ringing tone to the caller when the callee is being alerted. Such a UAC is supposed to generate ringing tones locally for its user as long as no early media is received from the UAS. If the UAS generates early media (e.g., an announcement or a special ringing tone), the UAC is supposed to play it rather than generate the ringing tone locally.

当被呼叫方收到警报时,在屏幕上显示消息,而另一个UA可以选择显示电话铃声的图片。许多SIP UAs选择模仿PSTN电话的用户界面。当被呼叫者收到警报时,它们会向呼叫者提供铃声。这样的UAC应该在本地为其用户生成铃声,只要没有收到来自UAS的早期媒体。如果UAS生成早期媒体(例如公告或特殊铃声),UAC应该播放该媒体,而不是在本地生成铃声。

The problem is that, sometimes, it is not an easy task for a UAC to know whether it will be receiving early media or it should generate local ringing. A UAS can send early media without using reliable provisional responses (very simple UASs do that) or it can send an answer in a reliable provisional response without any intention of sending early media (this is the case when preconditions are used). Therefore, by only looking at the SIP signalling, a UAC cannot be sure whether or not there will be early media for a particular session. The UAC needs to check if media packets are arriving at a given moment.

问题是,有时,UAC要知道它是否会接收早期媒体或是否应该生成本地铃声并非易事。UAS可以在不使用可靠临时响应的情况下发送早期媒体(非常简单的UAS可以这样做),也可以在可靠临时响应中发送答案,而不打算发送早期媒体(使用先决条件时就是这种情况)。因此,仅通过查看SIP信令,UAC无法确定特定会话是否会有早期媒体。UAC需要检查媒体包是否在给定时刻到达。

An implementation could even choose to look at the contents of the media packets, since they could carry only silence or comfort noise.

一个实现甚至可以选择查看媒体包的内容,因为它们只能携带静音或舒适噪音。

With this in mind, a UAC should develop its local policy regarding local ringing generation. For example, a POTS ("Plain Old Telephone Service")-like SIP User Agent (UA) could implement the following local policy:

考虑到这一点,UAC应制定关于本地振铃生成的本地政策。例如,类似于SIP用户代理(UA)的POTS(“普通旧电话服务”)可以实现以下本地策略:

1. Unless a 180 (Ringing) response is received, never generate local ringing.

1. 除非收到180(振铃)响应,否则切勿生成本地振铃。

2. If a 180 (Ringing) has been received but there are no incoming media packets, generate local ringing.

2. 如果已收到180(振铃),但没有传入的媒体数据包,则生成本地振铃。

3. If a 180 (Ringing) has been received and there are incoming media packets, play them and do not generate local ringing.

3. 如果收到180(振铃)且有传入的媒体数据包,则播放这些数据包,并且不生成本地振铃。

Note that a 180 (Ringing) response means that the callee is being alerted, and a UAS should send such a response if the callee is being alerted, regardless of the status of the early media session.

请注意,180(振铃)响应表示被呼叫方正在收到警报,如果被呼叫方收到警报,UAS应发送此类响应,而不管早期媒体会话的状态如何。

At first sight, such a policy may look difficult to implement in decomposed UAs (i.e., media gateway controller and media gateway), but this policy is the same as the one described in Section 2, which must be implemented by any UA. That is, any UA should play incoming

乍一看,这种策略在分解的UA(即媒体网关控制器和媒体网关)中可能难以实施,但该策略与第2节中描述的策略相同,任何UA都必须实施该策略。也就是说,任何UA都应该扮演传入角色

media packets (and stop local ringing tone generation if it was being performed) in order to avoid media clipping, even if the 200 (OK) response has not arrived. So, the tools to implement this early media policy are already available to any UA that uses SIP.

媒体数据包(如果正在生成本地铃声,则停止生成),以避免媒体剪辑,即使200(OK)响应尚未到达。因此,任何使用SIP的UA都可以使用实现此早期媒体策略的工具。

Note that, while it is not desirable to standardize a common local policy to be followed by every SIP UA, a particular subset of more or less homogeneous SIP UAs could use the same local policy by convention. Examples of such subsets of SIP UAs may be "all the PSTN/SIP gateways" or "every 3GPP IMS (Third Generation Partnership Project Internet Multimedia System) terminal". However, defining the particular common policy that such groups of SIP devices may use is outside the scope of this document.

注意,虽然不希望标准化每个SIP-UA要遵循的公共本地策略,但或多或少同质SIP-UA的特定子集可以根据约定使用相同的本地策略。SIP ua的此类子集的示例可以是“所有PSTN/SIP网关”或“每个3GPP IMS(第三代合作伙伴项目互联网多媒体系统)终端”。然而,定义此类SIP设备组可能使用的特定公共策略超出了本文档的范围。

3.3. Absence of an Early Media Indicator
3.3. 缺乏早期媒体指标

SIP, as opposed to other signalling protocols, does not provide an early media indicator. That is, there is no information about the presence or absence of early media in SIP. Such an indicator could be potentially used to avoid the generation of local ringing tone by the UAC when UAS intends to provide an in-band ringing tone or some type of announcement. However, in the majority of the cases, such an indicator would be of little use due to the way SIP works.

与其他信令协议不同,SIP不提供早期媒体指示符。也就是说,没有关于SIP中是否存在早期媒体的信息。当UAS打算提供带内铃声或某种类型的公告时,此类指示器可能用于避免UAC生成本地铃声。然而,在大多数情况下,由于SIP的工作方式,此类指标几乎没有用处。

One important reason limiting the benefit of a potential early media indicator is the loose coupling between SIP signalling and the media path. SIP signalling traverses a different path than the media. The media path is typically optimized to reduce the end-to-end delay (e.g., minimum number of intermediaries), while the SIP signalling path typically traverses a number of proxies providing different services for the session. Hence, it is very likely that the media packets with early media reach the UAC before any SIP message that could contain an early media indicator.

限制潜在早期媒体指示符好处的一个重要原因是SIP信令和媒体路径之间的松耦合。SIP信令通过与介质不同的路径。媒体路径通常被优化以减少端到端延迟(例如,最小数量的中介),而SIP信令路径通常穿过为会话提供不同服务的多个代理。因此,具有早期媒体的媒体分组很可能在任何可能包含早期媒体指示符的SIP消息之前到达UAC。

Nevertheless, sometimes SIP responses arrive at the UAC before any media packet. There are situations in which the UAS intends to send early media but cannot do it straight away. For example, UAs using Interactive Connectivity Establishment (ICE) [6] may need to exchange several Simple Traversals of the UDP Protocol through NAT (STUN) messages before being able to exchange media. In this situation, an early media indicator would keep the UAC from generating a local ringing tone during this time. However, while the early media is not arriving at the UAC, the user would not be aware that the remote user is being alerted, even though a 180 (Ringing) had been received. Therefore, a better solution would be to apply a local ringing tone until the early media packets could be sent from the UAS to the UAC. This solution does not require any early media indicator.

然而,有时SIP响应在任何媒体包之前到达UAC。有些情况下,UAS打算发送早期媒体,但无法立即发送。例如,使用交互式连接建立(ICE)[6]的UAs在能够交换媒体之前,可能需要通过NAT(STUN)消息交换UDP协议的几个简单遍历。在这种情况下,早期媒体指示器将阻止UAC在此期间生成本地铃声。然而,虽然早期媒体没有到达UAC,但用户不会意识到远程用户正在收到警报,即使已收到180(铃声)。因此,更好的解决方案是应用本地铃声,直到早期媒体包可以从UAS发送到UAC。此解决方案不需要任何早期介质指示器。

Note that migrations from local ringing tone to early media at the UAC happen in the presence of forking as well; one UAS sends a 180 (Ringing) response, and later, another UAS starts sending early media.

请注意,从本地铃声到UAC早期媒体的迁移也发生在分叉的情况下;一个UAS发送180(振铃)响应,稍后,另一个UAS开始发送早期媒体。

3.4. Applicability of the Gateway Model
3.4. 网关模型的适用性

Section 3 described some of the limitations of the gateway model. It produces media clipping in forking scenarios and requires media detection to generate local ringing properly. These issues are addressed by the application server model, described in Section 4, which is the recommended way of generating early media that is not continuous with the regular media generated during the session.

第3节描述了网关模型的一些限制。它在分叉场景中生成媒体剪辑,并需要媒体检测来正确生成本地铃声。这些问题由第4节中描述的应用服务器模型解决,这是生成早期介质的推荐方法,该介质与会话期间生成的常规介质不连续。

The gateway model is, therefore, acceptable in situations where the UA cannot distinguish between early media and regular media. A PSTN gateway is an example of this type of situation. The PSTN gateway receives media from the PSTN over a circuit, and sends it to the IP network. The gateway is not aware of the contents of the media, and it does not exactly know when the transition from early to regular media takes place. From the PSTN perspective, the circuit is a continuous source of media.

因此,在UA无法区分早期媒体和常规媒体的情况下,网关模型是可以接受的。PSTN网关就是这种情况的一个例子。PSTN网关通过电路从PSTN接收媒体,并将其发送到IP网络。网关不知道媒体的内容,也不确切知道从早期媒体到常规媒体的转换何时发生。从PSTN的角度来看,电路是一个连续的媒体源。

4. The Application Server Model
4. 应用服务器模型

The application server model consists of having the UAS behave as an application server to establish early media sessions with the UAC. The UAC indicates support for the early-session disposition type (defined in [2]) using the early-session option tag. This way, UASs know that they can keep offer/answer exchanges for early media (early-session disposition type) separate from regular media (session disposition type).

应用服务器模型包括让UAS充当应用服务器,与UAC建立早期媒体会话。UAC使用早期会话选项标记表示对早期会话处置类型(在[2]中定义)的支持。这样,UAS知道他们可以将早期介质(早期会话处置类型)的提供/应答交换与常规介质(会话处置类型)分开。

Sending early media using a different offer/answer exchange than the one used for sending regular media helps avoid media clipping in cases of forking. The UAC can reject or mute new offers for early media without muting the sessions that will carry media when the original INVITE is accepted. The UAC can give priority to media received over the latter sessions. This way, the application server model transitions from early to regular media at the right moment.

使用不同于用于发送常规介质的提供/应答交换发送早期介质,有助于避免出现分叉情况下的介质剪切。UAC可以拒绝或静音早期媒体的新优惠,而不会在接受原始邀请时静音将承载媒体的会话。UAC可以优先考虑在后面的会话中接收的媒体。这样,应用服务器模型就可以在适当的时候从早期介质过渡到常规介质。

Having a separate offer/answer exchange for early media also helps UACs decide whether or not local ringing should be generated. If a new early session is established and that early session contains at least an audio stream, the UAC can assume that there will be incoming early media and it can then avoid generating local ringing.

对早期媒体进行单独的报价/应答交换也有助于UAC决定是否应生成本地铃声。如果建立了一个新的早期会话,并且该早期会话至少包含一个音频流,UAC可以假设将有传入的早期媒体,然后它可以避免生成本地振铃。

An alternative model would include the addition of a new stream, with an "early media" label, to the original session between the UAC and the UAS using an UPDATE instead of establishing a new early session. We have chosen to establish a new early session to be coherent with the mechanism used by application servers that are NOT co-located with the UAS. This way, the UAS uses the same mechanism as any application server in the network to interact with the UAC.

另一种模式包括在UAC和UAS之间的原始会话中添加带有“早期媒体”标签的新流,使用更新而不是建立新的早期会话。我们选择建立一个新的早期会话,以与不与UAS位于同一位置的应用程序服务器使用的机制保持一致。这样,UAS使用与网络中任何应用服务器相同的机制与UAC进行交互。

4.1. In-Band Versus Out-of-Band Session Progress Information
4.1. 带内与带外会话进度信息

Note that, even when the application server model is used, a UA will have to choose which early media sessions are muted and which ones are rendered to the user. In order to make this choice easier for UAs, it is strongly recommended that information that is not essential for the session not be transmitted using early media. For instance, UAs should not use early media to send special ringing tones. The status code and the reason phrase in SIP can already inform the remote user about the progress of session establishment, without incurring the problems associated with early media.

注意,即使使用应用服务器模型,UA也必须选择哪些早期媒体会话被静音,哪些会话被呈现给用户。为了使UAs更容易选择,强烈建议不要使用早期媒体传输对会话不重要的信息。例如,UAs不应使用早期媒体发送特殊铃声。SIP中的状态代码和原因短语已经可以通知远程用户会话建立的进度,而不会产生与早期媒体相关的问题。

5. Alert-Info Header Field
5. 警报信息标题字段

The Alert-Info header field allows specifying an alternative ringing content, such as ringing tone, to the UAC. This header field tells the UAC which tone should be played in case local ringing is generated, but it does not tell the UAC when to generate local ringing. A UAC should follow the rules described above for ringing tone generation in both models. If, after following those rules, the UAC decides to play local ringing, it can then use the Alert-Info header field to generate it.

Alert Info header(警报信息标题)字段允许向UAC指定其他铃声内容,如铃声。此标题字段告诉UAC在生成本地振铃时应播放的音调,但不告诉UAC何时生成本地振铃。UAC应遵循上述规则生成两种型号的铃声。如果在遵循这些规则后,UAC决定播放本地铃声,则可以使用警报信息标题字段生成该铃声。

6. Security Considerations
6. 安全考虑

SIP uses the offer/answer model [3] to establish early sessions in both the gateway and the application server models. User Agents (UAs) generate a session description, which contains the transport address (i.e., IP address plus port) where they want to receive media, and send it to their peer in a SIP message. When media packets arrive at this transport address, the UA assumes that they come from the receiver of the SIP message carrying the session description. Nevertheless, attackers may attempt to gain access to the contents of the SIP message and send packets to the transport address contained in the session description. To prevent this situation, UAs SHOULD encrypt their session descriptions (e.g., using S/MIME).

SIP使用提供/应答模型[3]在网关和应用服务器模型中建立早期会话。用户代理(UAs)生成会话描述,其中包含他们希望接收媒体的传输地址(即IP地址加端口),并在SIP消息中将其发送给对等方。当媒体分组到达该传输地址时,UA假定它们来自承载会话描述的SIP消息的接收器。然而,攻击者可能试图访问SIP消息的内容,并将数据包发送到会话描述中包含的传输地址。为了防止这种情况,UAs应该加密其会话描述(例如,使用S/MIME)。

Still, even if a UA encrypts its session descriptions, an attacker may try to guess the transport address used by the UA and send media packets to that address. Guessing such a transport address is sometimes easier than it may seem because many UAs always pick up the same initial media port. To prevent this situation, UAs SHOULD use media-level authentication mechanisms such as the Secure Realtime Transport Protocol (SRTP)[7]. In addition, UAs that wish to keep their communications confidential SHOULD use media-level encryption mechanisms (e.g, SRTP [7]).

尽管如此,即使UA加密其会话描述,攻击者也可能试图猜测UA使用的传输地址并向该地址发送媒体包。猜测这样一个传输地址有时比看起来更容易,因为许多UAs总是选择相同的初始媒体端口。为了防止这种情况,UAs应该使用媒体级身份验证机制,如安全实时传输协议(SRTP)[7]。此外,希望对其通信保密的UAs应使用媒体级加密机制(例如,SRTP[7])。

Attackers may attempt to make a UA send media to a victim as part of a DoS attack. This can be done by sending a session description with the victim's transport address to the UA. To prevent this attack, the UA SHOULD engage in a handshake with the owner of the transport address received in a session description (just verifying willingness to receive media) before sending a large amount of data to the transport address. This check can be performed by using a connection oriented transport protocol, by using STUN [8] in an end-to-end fashion, or by the key exchange in SRTP [7].

作为DoS攻击的一部分,攻击者可能试图让UA向受害者发送媒体。这可以通过向UA发送带有受害者传输地址的会话描述来实现。为了防止这种攻击,UA应该在向传输地址发送大量数据之前,与会话描述中接收的传输地址的所有者进行握手(只是验证是否愿意接收媒体)。该检查可通过使用面向连接的传输协议、以端到端方式使用STUN[8]或通过SRTP[7]中的密钥交换来执行。

In any event, note that the previous security considerations are not early media specific, but apply to the usage of the offer/answer model in SIP to establish sessions in general.

在任何情况下,请注意,前面的安全注意事项不是特定于早期媒体的,而是适用于在SIP中使用提供/应答模型来建立会话。

Additionally, an early media-specific risk (roughly speaking, equivalent to forms of "toll fraud" in the PSTN) attempts to exploit the different charging policies some operators apply to early and regular media. When UAs are allowed to exchange early media for free, but are required to pay for regular media sessions, rogue UAs may try to establish a bidirectional early media session and never send a 200 (OK) response for the INVITE.

此外,早期媒体特有的风险(粗略地说,相当于PSTN中的“收费欺诈”形式)试图利用一些运营商适用于早期和常规媒体的不同收费政策。当UAs被允许免费交换早期媒体,但需要支付常规媒体会话的费用时,流氓UAs可能会尝试建立双向早期媒体会话,并且从不发送200(OK)的邀请响应。

On the other hand, some application servers (e.g., Interactive Voice Response systems) use bidirectional early media to obtain information from the callers (e.g., the PIN code of a calling card). So, we do not recommend that operators disallow bidirectional early media. Instead, operators should consider a remedy of charging early media exchanges that last too long, or stopping them at the media level (according to the operator's policy).

另一方面,一些应用服务器(例如,交互式语音响应系统)使用双向早期媒体从呼叫者获取信息(例如,电话卡的PIN码)。因此,我们不建议运营商禁止双向早期媒体。取而代之的是,运营商应该考虑对早期媒体交换收取过长时间的补救措施,或者在媒体层面停止运营(根据运营商的政策)。

7. Acknowledgments
7. 致谢

Jon Peterson provided useful ideas on the separation between the gateway model and the application server model.

Jon Peterson就网关模型和应用服务器模型之间的分离提供了有用的想法。

Paul Kyzivat, Christer Holmberg, Bill Marshall, Francois Audet, John Hearty, Adam Roach, Eric Burger, Rohan Mahy, and Allison Mankin provided useful comments and suggestions.

Paul Kyzivat、Christer Holmberg、Bill Marshall、Francois Audet、John Hearty、Adam Roach、Eric Burger、Rohan Mahy和Allison Mankin提供了有用的评论和建议。

8. References
8. 工具书类
8.1. Normative References
8.1. 规范性引用文件

[1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002.

[1] Rosenberg,J.,Schulzrinne,H.,Camarillo,G.,Johnston,A.,Peterson,J.,Sparks,R.,Handley,M.,和E.Schooler,“SIP:会话启动协议”,RFC 3261,2002年6月。

[2] Camarillo, G., "The Early Session Disposition Type for the Session Initiation Protocol (SIP)", RFC 3959, December 2004.

[2] Camarillo,G.“会话启动协议(SIP)的早期会话处置类型”,RFC 3959,2004年12月。

[3] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002.

[3] Rosenberg,J.和H.Schulzrinne,“具有会话描述协议(SDP)的提供/应答模型”,RFC 3264,2002年6月。

8.2. Informative References
8.2. 资料性引用

[4] Rosenberg, J. and H. Schulzrinne, "Reliability of Provisional Responses in Session Initiation Protocol (SIP)", RFC 3262, June 2002.

[4] Rosenberg,J.和H.Schulzrinne,“会话启动协议(SIP)中临时响应的可靠性”,RFC 3262,2002年6月。

[5] Rosenberg, J., "The Session Initiation Protocol (SIP) UPDATE Method", RFC 3311, October 2002.

[5] Rosenberg,J.,“会话启动协议(SIP)更新方法”,RFC3311,2002年10月。

[6] Rosenberg, J., "Interactive connectivity establishment (ICE): a methodology for network address translator (NAT) traversal for the session initiation protocol (SIP)", Work in progress, July 2003.

[6] Rosenberg,J.,“交互式连接建立(ICE):会话启动协议(SIP)的网络地址转换器(NAT)遍历方法”,正在进行的工作,2003年7月。

[7] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004.

[7] Baugher,M.,McGrew,D.,Naslund,M.,Carrara,E.,和K.Norrman,“安全实时传输协议(SRTP)”,RFC 37112004年3月。

[8] Rosenberg, J., Weinberger, J., Huitema, C., and R. Mahy, "STUN - Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs)", RFC 3489, March 2003.

[8] Rosenberg,J.,Weinberger,J.,Huitema,C.,和R.Mahy,“STUN-通过网络地址转换器(NAT)简单遍历用户数据报协议(UDP)”,RFC 3489,2003年3月。

[9] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[9] Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。

Authors' Addresses

作者地址

Gonzalo Camarillo Ericsson Advanced Signalling Research Lab. FIN-02420 Jorvas Finland

Gonzalo Camarillo Ericsson高级信号研究实验室FIN-02420 Jorvas芬兰

   EMail:  Gonzalo.Camarillo@ericsson.com
        
   EMail:  Gonzalo.Camarillo@ericsson.com
        

Henning Schulzrinne Dept. of Computer Science Columbia University 1214 Amsterdam Avenue, MC 0401 New York, NY 10027 USA

美国纽约州纽约市阿姆斯特丹大道1214号哥伦比亚大学计算机科学系,邮编:10027

   EMail:  schulzrinne@cs.columbia.edu
        
   EMail:  schulzrinne@cs.columbia.edu
        

Full Copyright Statement

完整版权声明

Copyright (C) The Internet Society (2004).

版权所有(C)互联网协会(2004年)。

This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.

本文件受BCP 78中包含的权利、许可和限制的约束,除其中规定外,作者保留其所有权利。

This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

本文件及其包含的信息是按“原样”提供的,贡献者、他/她所代表或赞助的组织(如有)、互联网协会和互联网工程任务组不承担任何明示或暗示的担保,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。

Intellectual Property

知识产权

The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the IETF's procedures with respect to rights in IETF Documents can be found in BCP 78 and BCP 79.

IETF对可能声称与本文件所述技术的实施或使用有关的任何知识产权或其他权利的有效性或范围,或此类权利下的任何许可可能或可能不可用的程度,不采取任何立场;它也不表示它已作出任何独立努力来确定任何此类权利。有关IETF文件中权利的IETF程序信息,请参见BCP 78和BCP 79。

Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.

向IETF秘书处披露的知识产权副本和任何许可证保证,或本规范实施者或用户试图获得使用此类专有权利的一般许可证或许可的结果,可从IETF在线知识产权存储库获取,网址为http://www.ietf.org/ipr.

The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.

IETF邀请任何相关方提请其注意任何版权、专利或专利申请,或其他可能涵盖实施本标准所需技术的专有权利。请将信息发送至IETF的IETF-ipr@ietf.org.

Acknowledgement

确认

Funding for the RFC Editor function is currently provided by the Internet Society.

RFC编辑功能的资金目前由互联网协会提供。