Internet Engineering Task Force (IETF) D. Burnett Request for Comments: 6787 Voxeo Category: Standards Track S. Shanmugham ISSN: 2070-1721 Cisco Systems, Inc. November 2012
Internet Engineering Task Force (IETF) D. Burnett Request for Comments: 6787 Voxeo Category: Standards Track S. Shanmugham ISSN: 2070-1721 Cisco Systems, Inc. November 2012
Media Resource Control Protocol Version 2 (MRCPv2)
媒体资源控制协议版本2(MRCPv2)
Abstract
摘要
The Media Resource Control Protocol Version 2 (MRCPv2) allows client hosts to control media service resources such as speech synthesizers, recognizers, verifiers, and identifiers residing in servers on the network. MRCPv2 is not a "stand-alone" protocol -- it relies on other protocols, such as the Session Initiation Protocol (SIP), to coordinate MRCPv2 clients and servers and manage sessions between them, and the Session Description Protocol (SDP) to describe, discover, and exchange capabilities. It also depends on SIP and SDP to establish the media sessions and associated parameters between the media source or sink and the media server. Once this is done, the MRCPv2 exchange operates over the control session established above, allowing the client to control the media processing resources on the speech resource server.
媒体资源控制协议版本2(MRCPv2)允许客户端主机控制媒体服务资源,如语音合成器、识别器、验证器和驻留在网络服务器中的标识符。MRCPv2不是“独立”协议——它依赖于其他协议,如会话启动协议(SIP)来协调MRCPv2客户端和服务器并管理它们之间的会话,以及会话描述协议(SDP)来描述、发现和交换功能。它还依赖于SIP和SDP在媒体源或接收器与媒体服务器之间建立媒体会话和相关参数。完成此操作后,MRCPv2 exchange将在上面建立的控制会话上运行,从而允许客户端控制语音资源服务器上的媒体处理资源。
Status of This Memo
关于下段备忘
This is an Internet Standards Track document.
这是一份互联网标准跟踪文件。
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741.
本文件是互联网工程任务组(IETF)的产品。它代表了IETF社区的共识。它已经接受了公众审查,并已被互联网工程指导小组(IESG)批准出版。有关互联网标准的更多信息,请参见RFC 5741第2节。
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6787.
有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc6787.
Copyright Notice
版权公告
Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved.
版权所有(c)2012 IETF信托基金和确定为文件作者的人员。版权所有。
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents
本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请审阅这些文件
carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
请仔细阅读,因为他们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。
This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English.
本文件可能包含2008年11月10日之前发布或公开的IETF文件或IETF贡献中的材料。控制某些材料版权的人员可能未授予IETF信托允许在IETF标准流程之外修改此类材料的权利。在未从控制此类材料版权的人员处获得充分许可的情况下,不得在IETF标准流程之外修改本文件,也不得在IETF标准流程之外创建其衍生作品,除了将其格式化以RFC形式发布或将其翻译成英语以外的其他语言。
Table of Contents
目录
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 8 2. Document Conventions . . . . . . . . . . . . . . . . . . . . 9 2.1. Definitions . . . . . . . . . . . . . . . . . . . . . . 10 2.2. State-Machine Diagrams . . . . . . . . . . . . . . . . . 10 2.3. URI Schemes . . . . . . . . . . . . . . . . . . . . . . 11 3. Architecture . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1. MRCPv2 Media Resource Types . . . . . . . . . . . . . . 12 3.2. Server and Resource Addressing . . . . . . . . . . . . . 14 4. MRCPv2 Basics . . . . . . . . . . . . . . . . . . . . . . . . 14 4.1. Connecting to the Server . . . . . . . . . . . . . . . . 14 4.2. Managing Resource Control Channels . . . . . . . . . . . 14 4.3. SIP Session Example . . . . . . . . . . . . . . . . . . 17 4.4. Media Streams and RTP Ports . . . . . . . . . . . . . . 22 4.5. MRCPv2 Message Transport . . . . . . . . . . . . . . . . 24 4.6. MRCPv2 Session Termination . . . . . . . . . . . . . . . 24 5. MRCPv2 Specification . . . . . . . . . . . . . . . . . . . . 24 5.1. Common Protocol Elements . . . . . . . . . . . . . . . . 25 5.2. Request . . . . . . . . . . . . . . . . . . . . . . . . 28 5.3. Response . . . . . . . . . . . . . . . . . . . . . . . . 29 5.4. Status Codes . . . . . . . . . . . . . . . . . . . . . . 30 5.5. Events . . . . . . . . . . . . . . . . . . . . . . . . . 31 6. MRCPv2 Generic Methods, Headers, and Result Structure . . . . 32 6.1. Generic Methods . . . . . . . . . . . . . . . . . . . . 32 6.1.1. SET-PARAMS . . . . . . . . . . . . . . . . . . . . . 32 6.1.2. GET-PARAMS . . . . . . . . . . . . . . . . . . . . . 33 6.2. Generic Message Headers . . . . . . . . . . . . . . . . 34 6.2.1. Channel-Identifier . . . . . . . . . . . . . . . . . 35 6.2.2. Accept . . . . . . . . . . . . . . . . . . . . . . . 36
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 8 2. Document Conventions . . . . . . . . . . . . . . . . . . . . 9 2.1. Definitions . . . . . . . . . . . . . . . . . . . . . . 10 2.2. State-Machine Diagrams . . . . . . . . . . . . . . . . . 10 2.3. URI Schemes . . . . . . . . . . . . . . . . . . . . . . 11 3. Architecture . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1. MRCPv2 Media Resource Types . . . . . . . . . . . . . . 12 3.2. Server and Resource Addressing . . . . . . . . . . . . . 14 4. MRCPv2 Basics . . . . . . . . . . . . . . . . . . . . . . . . 14 4.1. Connecting to the Server . . . . . . . . . . . . . . . . 14 4.2. Managing Resource Control Channels . . . . . . . . . . . 14 4.3. SIP Session Example . . . . . . . . . . . . . . . . . . 17 4.4. Media Streams and RTP Ports . . . . . . . . . . . . . . 22 4.5. MRCPv2 Message Transport . . . . . . . . . . . . . . . . 24 4.6. MRCPv2 Session Termination . . . . . . . . . . . . . . . 24 5. MRCPv2 Specification . . . . . . . . . . . . . . . . . . . . 24 5.1. Common Protocol Elements . . . . . . . . . . . . . . . . 25 5.2. Request . . . . . . . . . . . . . . . . . . . . . . . . 28 5.3. Response . . . . . . . . . . . . . . . . . . . . . . . . 29 5.4. Status Codes . . . . . . . . . . . . . . . . . . . . . . 30 5.5. Events . . . . . . . . . . . . . . . . . . . . . . . . . 31 6. MRCPv2 Generic Methods, Headers, and Result Structure . . . . 32 6.1. Generic Methods . . . . . . . . . . . . . . . . . . . . 32 6.1.1. SET-PARAMS . . . . . . . . . . . . . . . . . . . . . 32 6.1.2. GET-PARAMS . . . . . . . . . . . . . . . . . . . . . 33 6.2. Generic Message Headers . . . . . . . . . . . . . . . . 34 6.2.1. Channel-Identifier . . . . . . . . . . . . . . . . . 35 6.2.2. Accept . . . . . . . . . . . . . . . . . . . . . . . 36
6.2.3. Active-Request-Id-List . . . . . . . . . . . . . . . 36 6.2.4. Proxy-Sync-Id . . . . . . . . . . . . . . . . . . . 36 6.2.5. Accept-Charset . . . . . . . . . . . . . . . . . . . 37 6.2.6. Content-Type . . . . . . . . . . . . . . . . . . . . 37 6.2.7. Content-ID . . . . . . . . . . . . . . . . . . . . . 38 6.2.8. Content-Base . . . . . . . . . . . . . . . . . . . . 38 6.2.9. Content-Encoding . . . . . . . . . . . . . . . . . . 38 6.2.10. Content-Location . . . . . . . . . . . . . . . . . . 39 6.2.11. Content-Length . . . . . . . . . . . . . . . . . . . 39 6.2.12. Fetch Timeout . . . . . . . . . . . . . . . . . . . 39 6.2.13. Cache-Control . . . . . . . . . . . . . . . . . . . 40 6.2.14. Logging-Tag . . . . . . . . . . . . . . . . . . . . 41 6.2.15. Set-Cookie . . . . . . . . . . . . . . . . . . . . . 42 6.2.16. Vendor-Specific Parameters . . . . . . . . . . . . . 44 6.3. Generic Result Structure . . . . . . . . . . . . . . . . 44 6.3.1. Natural Language Semantics Markup Language . . . . . 45 7. Resource Discovery . . . . . . . . . . . . . . . . . . . . . 46 8. Speech Synthesizer Resource . . . . . . . . . . . . . . . . . 47 8.1. Synthesizer State Machine . . . . . . . . . . . . . . . 48 8.2. Synthesizer Methods . . . . . . . . . . . . . . . . . . 48 8.3. Synthesizer Events . . . . . . . . . . . . . . . . . . . 49 8.4. Synthesizer Header Fields . . . . . . . . . . . . . . . 49 8.4.1. Jump-Size . . . . . . . . . . . . . . . . . . . . . 49 8.4.2. Kill-On-Barge-In . . . . . . . . . . . . . . . . . . 50 8.4.3. Speaker-Profile . . . . . . . . . . . . . . . . . . 51 8.4.4. Completion-Cause . . . . . . . . . . . . . . . . . . 51 8.4.5. Completion-Reason . . . . . . . . . . . . . . . . . 52 8.4.6. Voice-Parameter . . . . . . . . . . . . . . . . . . 52 8.4.7. Prosody-Parameters . . . . . . . . . . . . . . . . . 53 8.4.8. Speech-Marker . . . . . . . . . . . . . . . . . . . 53 8.4.9. Speech-Language . . . . . . . . . . . . . . . . . . 54 8.4.10. Fetch-Hint . . . . . . . . . . . . . . . . . . . . . 54 8.4.11. Audio-Fetch-Hint . . . . . . . . . . . . . . . . . . 55 8.4.12. Failed-URI . . . . . . . . . . . . . . . . . . . . . 55 8.4.13. Failed-URI-Cause . . . . . . . . . . . . . . . . . . 55 8.4.14. Speak-Restart . . . . . . . . . . . . . . . . . . . 56 8.4.15. Speak-Length . . . . . . . . . . . . . . . . . . . . 56 8.4.16. Load-Lexicon . . . . . . . . . . . . . . . . . . . . 57 8.4.17. Lexicon-Search-Order . . . . . . . . . . . . . . . . 57 8.5. Synthesizer Message Body . . . . . . . . . . . . . . . . 57 8.5.1. Synthesizer Speech Data . . . . . . . . . . . . . . 57 8.5.2. Lexicon Data . . . . . . . . . . . . . . . . . . . . 59 8.6. SPEAK Method . . . . . . . . . . . . . . . . . . . . . . 60 8.7. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 62 8.8. BARGE-IN-OCCURRED . . . . . . . . . . . . . . . . . . . 63 8.9. PAUSE . . . . . . . . . . . . . . . . . . . . . . . . . 65 8.10. RESUME . . . . . . . . . . . . . . . . . . . . . . . . . 66 8.11. CONTROL . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2.3. Active-Request-Id-List . . . . . . . . . . . . . . . 36 6.2.4. Proxy-Sync-Id . . . . . . . . . . . . . . . . . . . 36 6.2.5. Accept-Charset . . . . . . . . . . . . . . . . . . . 37 6.2.6. Content-Type . . . . . . . . . . . . . . . . . . . . 37 6.2.7. Content-ID . . . . . . . . . . . . . . . . . . . . . 38 6.2.8. Content-Base . . . . . . . . . . . . . . . . . . . . 38 6.2.9. Content-Encoding . . . . . . . . . . . . . . . . . . 38 6.2.10. Content-Location . . . . . . . . . . . . . . . . . . 39 6.2.11. Content-Length . . . . . . . . . . . . . . . . . . . 39 6.2.12. Fetch Timeout . . . . . . . . . . . . . . . . . . . 39 6.2.13. Cache-Control . . . . . . . . . . . . . . . . . . . 40 6.2.14. Logging-Tag . . . . . . . . . . . . . . . . . . . . 41 6.2.15. Set-Cookie . . . . . . . . . . . . . . . . . . . . . 42 6.2.16. Vendor-Specific Parameters . . . . . . . . . . . . . 44 6.3. Generic Result Structure . . . . . . . . . . . . . . . . 44 6.3.1. Natural Language Semantics Markup Language . . . . . 45 7. Resource Discovery . . . . . . . . . . . . . . . . . . . . . 46 8. Speech Synthesizer Resource . . . . . . . . . . . . . . . . . 47 8.1. Synthesizer State Machine . . . . . . . . . . . . . . . 48 8.2. Synthesizer Methods . . . . . . . . . . . . . . . . . . 48 8.3. Synthesizer Events . . . . . . . . . . . . . . . . . . . 49 8.4. Synthesizer Header Fields . . . . . . . . . . . . . . . 49 8.4.1. Jump-Size . . . . . . . . . . . . . . . . . . . . . 49 8.4.2. Kill-On-Barge-In . . . . . . . . . . . . . . . . . . 50 8.4.3. Speaker-Profile . . . . . . . . . . . . . . . . . . 51 8.4.4. Completion-Cause . . . . . . . . . . . . . . . . . . 51 8.4.5. Completion-Reason . . . . . . . . . . . . . . . . . 52 8.4.6. Voice-Parameter . . . . . . . . . . . . . . . . . . 52 8.4.7. Prosody-Parameters . . . . . . . . . . . . . . . . . 53 8.4.8. Speech-Marker . . . . . . . . . . . . . . . . . . . 53 8.4.9. Speech-Language . . . . . . . . . . . . . . . . . . 54 8.4.10. Fetch-Hint . . . . . . . . . . . . . . . . . . . . . 54 8.4.11. Audio-Fetch-Hint . . . . . . . . . . . . . . . . . . 55 8.4.12. Failed-URI . . . . . . . . . . . . . . . . . . . . . 55 8.4.13. Failed-URI-Cause . . . . . . . . . . . . . . . . . . 55 8.4.14. Speak-Restart . . . . . . . . . . . . . . . . . . . 56 8.4.15. Speak-Length . . . . . . . . . . . . . . . . . . . . 56 8.4.16. Load-Lexicon . . . . . . . . . . . . . . . . . . . . 57 8.4.17. Lexicon-Search-Order . . . . . . . . . . . . . . . . 57 8.5. Synthesizer Message Body . . . . . . . . . . . . . . . . 57 8.5.1. Synthesizer Speech Data . . . . . . . . . . . . . . 57 8.5.2. Lexicon Data . . . . . . . . . . . . . . . . . . . . 59 8.6. SPEAK Method . . . . . . . . . . . . . . . . . . . . . . 60 8.7. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 62 8.8. BARGE-IN-OCCURRED . . . . . . . . . . . . . . . . . . . 63 8.9. PAUSE . . . . . . . . . . . . . . . . . . . . . . . . . 65 8.10. RESUME . . . . . . . . . . . . . . . . . . . . . . . . . 66 8.11. CONTROL . . . . . . . . . . . . . . . . . . . . . . . . 67
8.12. SPEAK-COMPLETE . . . . . . . . . . . . . . . . . . . . . 69 8.13. SPEECH-MARKER . . . . . . . . . . . . . . . . . . . . . 70 8.14. DEFINE-LEXICON . . . . . . . . . . . . . . . . . . . . . 71 9. Speech Recognizer Resource . . . . . . . . . . . . . . . . . 72 9.1. Recognizer State Machine . . . . . . . . . . . . . . . . 74 9.2. Recognizer Methods . . . . . . . . . . . . . . . . . . . 74 9.3. Recognizer Events . . . . . . . . . . . . . . . . . . . 75 9.4. Recognizer Header Fields . . . . . . . . . . . . . . . . 75 9.4.1. Confidence-Threshold . . . . . . . . . . . . . . . . 77 9.4.2. Sensitivity-Level . . . . . . . . . . . . . . . . . 77 9.4.3. Speed-Vs-Accuracy . . . . . . . . . . . . . . . . . 77 9.4.4. N-Best-List-Length . . . . . . . . . . . . . . . . . 78 9.4.5. Input-Type . . . . . . . . . . . . . . . . . . . . . 78 9.4.6. No-Input-Timeout . . . . . . . . . . . . . . . . . . 78 9.4.7. Recognition-Timeout . . . . . . . . . . . . . . . . 79 9.4.8. Waveform-URI . . . . . . . . . . . . . . . . . . . . 79 9.4.9. Media-Type . . . . . . . . . . . . . . . . . . . . . 80 9.4.10. Input-Waveform-URI . . . . . . . . . . . . . . . . . 80 9.4.11. Completion-Cause . . . . . . . . . . . . . . . . . . 80 9.4.12. Completion-Reason . . . . . . . . . . . . . . . . . 83 9.4.13. Recognizer-Context-Block . . . . . . . . . . . . . . 83 9.4.14. Start-Input-Timers . . . . . . . . . . . . . . . . . 83 9.4.15. Speech-Complete-Timeout . . . . . . . . . . . . . . 84 9.4.16. Speech-Incomplete-Timeout . . . . . . . . . . . . . 84 9.4.17. DTMF-Interdigit-Timeout . . . . . . . . . . . . . . 85 9.4.18. DTMF-Term-Timeout . . . . . . . . . . . . . . . . . 85 9.4.19. DTMF-Term-Char . . . . . . . . . . . . . . . . . . . 85 9.4.20. Failed-URI . . . . . . . . . . . . . . . . . . . . . 86 9.4.21. Failed-URI-Cause . . . . . . . . . . . . . . . . . . 86 9.4.22. Save-Waveform . . . . . . . . . . . . . . . . . . . 86 9.4.23. New-Audio-Channel . . . . . . . . . . . . . . . . . 86 9.4.24. Speech-Language . . . . . . . . . . . . . . . . . . 87 9.4.25. Ver-Buffer-Utterance . . . . . . . . . . . . . . . . 87 9.4.26. Recognition-Mode . . . . . . . . . . . . . . . . . . 87 9.4.27. Cancel-If-Queue . . . . . . . . . . . . . . . . . . 88 9.4.28. Hotword-Max-Duration . . . . . . . . . . . . . . . . 88 9.4.29. Hotword-Min-Duration . . . . . . . . . . . . . . . . 88 9.4.30. Interpret-Text . . . . . . . . . . . . . . . . . . . 89 9.4.31. DTMF-Buffer-Time . . . . . . . . . . . . . . . . . . 89 9.4.32. Clear-DTMF-Buffer . . . . . . . . . . . . . . . . . 89 9.4.33. Early-No-Match . . . . . . . . . . . . . . . . . . . 90 9.4.34. Num-Min-Consistent-Pronunciations . . . . . . . . . 90 9.4.35. Consistency-Threshold . . . . . . . . . . . . . . . 90 9.4.36. Clash-Threshold . . . . . . . . . . . . . . . . . . 90 9.4.37. Personal-Grammar-URI . . . . . . . . . . . . . . . . 91 9.4.38. Enroll-Utterance . . . . . . . . . . . . . . . . . . 91 9.4.39. Phrase-Id . . . . . . . . . . . . . . . . . . . . . 91 9.4.40. Phrase-NL . . . . . . . . . . . . . . . . . . . . . 92
8.12. SPEAK-COMPLETE . . . . . . . . . . . . . . . . . . . . . 69 8.13. SPEECH-MARKER . . . . . . . . . . . . . . . . . . . . . 70 8.14. DEFINE-LEXICON . . . . . . . . . . . . . . . . . . . . . 71 9. Speech Recognizer Resource . . . . . . . . . . . . . . . . . 72 9.1. Recognizer State Machine . . . . . . . . . . . . . . . . 74 9.2. Recognizer Methods . . . . . . . . . . . . . . . . . . . 74 9.3. Recognizer Events . . . . . . . . . . . . . . . . . . . 75 9.4. Recognizer Header Fields . . . . . . . . . . . . . . . . 75 9.4.1. Confidence-Threshold . . . . . . . . . . . . . . . . 77 9.4.2. Sensitivity-Level . . . . . . . . . . . . . . . . . 77 9.4.3. Speed-Vs-Accuracy . . . . . . . . . . . . . . . . . 77 9.4.4. N-Best-List-Length . . . . . . . . . . . . . . . . . 78 9.4.5. Input-Type . . . . . . . . . . . . . . . . . . . . . 78 9.4.6. No-Input-Timeout . . . . . . . . . . . . . . . . . . 78 9.4.7. Recognition-Timeout . . . . . . . . . . . . . . . . 79 9.4.8. Waveform-URI . . . . . . . . . . . . . . . . . . . . 79 9.4.9. Media-Type . . . . . . . . . . . . . . . . . . . . . 80 9.4.10. Input-Waveform-URI . . . . . . . . . . . . . . . . . 80 9.4.11. Completion-Cause . . . . . . . . . . . . . . . . . . 80 9.4.12. Completion-Reason . . . . . . . . . . . . . . . . . 83 9.4.13. Recognizer-Context-Block . . . . . . . . . . . . . . 83 9.4.14. Start-Input-Timers . . . . . . . . . . . . . . . . . 83 9.4.15. Speech-Complete-Timeout . . . . . . . . . . . . . . 84 9.4.16. Speech-Incomplete-Timeout . . . . . . . . . . . . . 84 9.4.17. DTMF-Interdigit-Timeout . . . . . . . . . . . . . . 85 9.4.18. DTMF-Term-Timeout . . . . . . . . . . . . . . . . . 85 9.4.19. DTMF-Term-Char . . . . . . . . . . . . . . . . . . . 85 9.4.20. Failed-URI . . . . . . . . . . . . . . . . . . . . . 86 9.4.21. Failed-URI-Cause . . . . . . . . . . . . . . . . . . 86 9.4.22. Save-Waveform . . . . . . . . . . . . . . . . . . . 86 9.4.23. New-Audio-Channel . . . . . . . . . . . . . . . . . 86 9.4.24. Speech-Language . . . . . . . . . . . . . . . . . . 87 9.4.25. Ver-Buffer-Utterance . . . . . . . . . . . . . . . . 87 9.4.26. Recognition-Mode . . . . . . . . . . . . . . . . . . 87 9.4.27. Cancel-If-Queue . . . . . . . . . . . . . . . . . . 88 9.4.28. Hotword-Max-Duration . . . . . . . . . . . . . . . . 88 9.4.29. Hotword-Min-Duration . . . . . . . . . . . . . . . . 88 9.4.30. Interpret-Text . . . . . . . . . . . . . . . . . . . 89 9.4.31. DTMF-Buffer-Time . . . . . . . . . . . . . . . . . . 89 9.4.32. Clear-DTMF-Buffer . . . . . . . . . . . . . . . . . 89 9.4.33. Early-No-Match . . . . . . . . . . . . . . . . . . . 90 9.4.34. Num-Min-Consistent-Pronunciations . . . . . . . . . 90 9.4.35. Consistency-Threshold . . . . . . . . . . . . . . . 90 9.4.36. Clash-Threshold . . . . . . . . . . . . . . . . . . 90 9.4.37. Personal-Grammar-URI . . . . . . . . . . . . . . . . 91 9.4.38. Enroll-Utterance . . . . . . . . . . . . . . . . . . 91 9.4.39. Phrase-Id . . . . . . . . . . . . . . . . . . . . . 91 9.4.40. Phrase-NL . . . . . . . . . . . . . . . . . . . . . 92
9.4.41. Weight . . . . . . . . . . . . . . . . . . . . . . . 92 9.4.42. Save-Best-Waveform . . . . . . . . . . . . . . . . . 92 9.4.43. New-Phrase-Id . . . . . . . . . . . . . . . . . . . 93 9.4.44. Confusable-Phrases-URI . . . . . . . . . . . . . . . 93 9.4.45. Abort-Phrase-Enrollment . . . . . . . . . . . . . . 93 9.5. Recognizer Message Body . . . . . . . . . . . . . . . . 93 9.5.1. Recognizer Grammar Data . . . . . . . . . . . . . . 93 9.5.2. Recognizer Result Data . . . . . . . . . . . . . . . 97 9.5.3. Enrollment Result Data . . . . . . . . . . . . . . . 98 9.5.4. Recognizer Context Block . . . . . . . . . . . . . . 98 9.6. Recognizer Results . . . . . . . . . . . . . . . . . . . 99 9.6.1. Markup Functions . . . . . . . . . . . . . . . . . . 99 9.6.2. Overview of Recognizer Result Elements and Their Relationships . . . . . . . . . . . . . . . . . . . 100 9.6.3. Elements and Attributes . . . . . . . . . . . . . . 101 9.7. Enrollment Results . . . . . . . . . . . . . . . . . . . 106 9.7.1. <num-clashes> Element . . . . . . . . . . . . . . . 106 9.7.2. <num-good-repetitions> Element . . . . . . . . . . . 106 9.7.3. <num-repetitions-still-needed> Element . . . . . . . 107 9.7.4. <consistency-status> Element . . . . . . . . . . . . 107 9.7.5. <clash-phrase-ids> Element . . . . . . . . . . . . . 107 9.7.6. <transcriptions> Element . . . . . . . . . . . . . . 107 9.7.7. <confusable-phrases> Element . . . . . . . . . . . . 107 9.8. DEFINE-GRAMMAR . . . . . . . . . . . . . . . . . . . . . 107 9.9. RECOGNIZE . . . . . . . . . . . . . . . . . . . . . . . 111 9.10. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 118 9.11. GET-RESULT . . . . . . . . . . . . . . . . . . . . . . . 119 9.12. START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 120 9.13. START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 120 9.14. RECOGNITION-COMPLETE . . . . . . . . . . . . . . . . . . 120 9.15. START-PHRASE-ENROLLMENT . . . . . . . . . . . . . . . . 123 9.16. ENROLLMENT-ROLLBACK . . . . . . . . . . . . . . . . . . 124 9.17. END-PHRASE-ENROLLMENT . . . . . . . . . . . . . . . . . 124 9.18. MODIFY-PHRASE . . . . . . . . . . . . . . . . . . . . . 125 9.19. DELETE-PHRASE . . . . . . . . . . . . . . . . . . . . . 125 9.20. INTERPRET . . . . . . . . . . . . . . . . . . . . . . . 125 9.21. INTERPRETATION-COMPLETE . . . . . . . . . . . . . . . . 127 9.22. DTMF Detection . . . . . . . . . . . . . . . . . . . . . 128 10. Recorder Resource . . . . . . . . . . . . . . . . . . . . . . 129 10.1. Recorder State Machine . . . . . . . . . . . . . . . . . 129 10.2. Recorder Methods . . . . . . . . . . . . . . . . . . . . 130 10.3. Recorder Events . . . . . . . . . . . . . . . . . . . . 130 10.4. Recorder Header Fields . . . . . . . . . . . . . . . . . 130 10.4.1. Sensitivity-Level . . . . . . . . . . . . . . . . . 130 10.4.2. No-Input-Timeout . . . . . . . . . . . . . . . . . . 131 10.4.3. Completion-Cause . . . . . . . . . . . . . . . . . . 131 10.4.4. Completion-Reason . . . . . . . . . . . . . . . . . 132 10.4.5. Failed-URI . . . . . . . . . . . . . . . . . . . . . 132
9.4.41. Weight . . . . . . . . . . . . . . . . . . . . . . . 92 9.4.42. Save-Best-Waveform . . . . . . . . . . . . . . . . . 92 9.4.43. New-Phrase-Id . . . . . . . . . . . . . . . . . . . 93 9.4.44. Confusable-Phrases-URI . . . . . . . . . . . . . . . 93 9.4.45. Abort-Phrase-Enrollment . . . . . . . . . . . . . . 93 9.5. Recognizer Message Body . . . . . . . . . . . . . . . . 93 9.5.1. Recognizer Grammar Data . . . . . . . . . . . . . . 93 9.5.2. Recognizer Result Data . . . . . . . . . . . . . . . 97 9.5.3. Enrollment Result Data . . . . . . . . . . . . . . . 98 9.5.4. Recognizer Context Block . . . . . . . . . . . . . . 98 9.6. Recognizer Results . . . . . . . . . . . . . . . . . . . 99 9.6.1. Markup Functions . . . . . . . . . . . . . . . . . . 99 9.6.2. Overview of Recognizer Result Elements and Their Relationships . . . . . . . . . . . . . . . . . . . 100 9.6.3. Elements and Attributes . . . . . . . . . . . . . . 101 9.7. Enrollment Results . . . . . . . . . . . . . . . . . . . 106 9.7.1. <num-clashes> Element . . . . . . . . . . . . . . . 106 9.7.2. <num-good-repetitions> Element . . . . . . . . . . . 106 9.7.3. <num-repetitions-still-needed> Element . . . . . . . 107 9.7.4. <consistency-status> Element . . . . . . . . . . . . 107 9.7.5. <clash-phrase-ids> Element . . . . . . . . . . . . . 107 9.7.6. <transcriptions> Element . . . . . . . . . . . . . . 107 9.7.7. <confusable-phrases> Element . . . . . . . . . . . . 107 9.8. DEFINE-GRAMMAR . . . . . . . . . . . . . . . . . . . . . 107 9.9. RECOGNIZE . . . . . . . . . . . . . . . . . . . . . . . 111 9.10. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 118 9.11. GET-RESULT . . . . . . . . . . . . . . . . . . . . . . . 119 9.12. START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 120 9.13. START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 120 9.14. RECOGNITION-COMPLETE . . . . . . . . . . . . . . . . . . 120 9.15. START-PHRASE-ENROLLMENT . . . . . . . . . . . . . . . . 123 9.16. ENROLLMENT-ROLLBACK . . . . . . . . . . . . . . . . . . 124 9.17. END-PHRASE-ENROLLMENT . . . . . . . . . . . . . . . . . 124 9.18. MODIFY-PHRASE . . . . . . . . . . . . . . . . . . . . . 125 9.19. DELETE-PHRASE . . . . . . . . . . . . . . . . . . . . . 125 9.20. INTERPRET . . . . . . . . . . . . . . . . . . . . . . . 125 9.21. INTERPRETATION-COMPLETE . . . . . . . . . . . . . . . . 127 9.22. DTMF Detection . . . . . . . . . . . . . . . . . . . . . 128 10. Recorder Resource . . . . . . . . . . . . . . . . . . . . . . 129 10.1. Recorder State Machine . . . . . . . . . . . . . . . . . 129 10.2. Recorder Methods . . . . . . . . . . . . . . . . . . . . 130 10.3. Recorder Events . . . . . . . . . . . . . . . . . . . . 130 10.4. Recorder Header Fields . . . . . . . . . . . . . . . . . 130 10.4.1. Sensitivity-Level . . . . . . . . . . . . . . . . . 130 10.4.2. No-Input-Timeout . . . . . . . . . . . . . . . . . . 131 10.4.3. Completion-Cause . . . . . . . . . . . . . . . . . . 131 10.4.4. Completion-Reason . . . . . . . . . . . . . . . . . 132 10.4.5. Failed-URI . . . . . . . . . . . . . . . . . . . . . 132
10.4.6. Failed-URI-Cause . . . . . . . . . . . . . . . . . . 132 10.4.7. Record-URI . . . . . . . . . . . . . . . . . . . . . 132 10.4.8. Media-Type . . . . . . . . . . . . . . . . . . . . . 133 10.4.9. Max-Time . . . . . . . . . . . . . . . . . . . . . . 133 10.4.10. Trim-Length . . . . . . . . . . . . . . . . . . . . 134 10.4.11. Final-Silence . . . . . . . . . . . . . . . . . . . 134 10.4.12. Capture-On-Speech . . . . . . . . . . . . . . . . . 134 10.4.13. Ver-Buffer-Utterance . . . . . . . . . . . . . . . . 134 10.4.14. Start-Input-Timers . . . . . . . . . . . . . . . . . 135 10.4.15. New-Audio-Channel . . . . . . . . . . . . . . . . . 135 10.5. Recorder Message Body . . . . . . . . . . . . . . . . . 135 10.6. RECORD . . . . . . . . . . . . . . . . . . . . . . . . . 135 10.7. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 136 10.8. RECORD-COMPLETE . . . . . . . . . . . . . . . . . . . . 137 10.9. START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 138 10.10. START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 138 11. Speaker Verification and Identification . . . . . . . . . . . 139 11.1. Speaker Verification State Machine . . . . . . . . . . . 140 11.2. Speaker Verification Methods . . . . . . . . . . . . . . 142 11.3. Verification Events . . . . . . . . . . . . . . . . . . 144 11.4. Verification Header Fields . . . . . . . . . . . . . . . 144 11.4.1. Repository-URI . . . . . . . . . . . . . . . . . . . 144 11.4.2. Voiceprint-Identifier . . . . . . . . . . . . . . . 145 11.4.3. Verification-Mode . . . . . . . . . . . . . . . . . 145 11.4.4. Adapt-Model . . . . . . . . . . . . . . . . . . . . 146 11.4.5. Abort-Model . . . . . . . . . . . . . . . . . . . . 146 11.4.6. Min-Verification-Score . . . . . . . . . . . . . . . 147 11.4.7. Num-Min-Verification-Phrases . . . . . . . . . . . . 147 11.4.8. Num-Max-Verification-Phrases . . . . . . . . . . . . 147 11.4.9. No-Input-Timeout . . . . . . . . . . . . . . . . . . 148 11.4.10. Save-Waveform . . . . . . . . . . . . . . . . . . . 148 11.4.11. Media-Type . . . . . . . . . . . . . . . . . . . . . 148 11.4.12. Waveform-URI . . . . . . . . . . . . . . . . . . . . 148 11.4.13. Voiceprint-Exists . . . . . . . . . . . . . . . . . 149 11.4.14. Ver-Buffer-Utterance . . . . . . . . . . . . . . . . 149 11.4.15. Input-Waveform-URI . . . . . . . . . . . . . . . . . 149 11.4.16. Completion-Cause . . . . . . . . . . . . . . . . . . 150 11.4.17. Completion-Reason . . . . . . . . . . . . . . . . . 151 11.4.18. Speech-Complete-Timeout . . . . . . . . . . . . . . 151 11.4.19. New-Audio-Channel . . . . . . . . . . . . . . . . . 152 11.4.20. Abort-Verification . . . . . . . . . . . . . . . . . 152 11.4.21. Start-Input-Timers . . . . . . . . . . . . . . . . . 152 11.5. Verification Message Body . . . . . . . . . . . . . . . 152 11.5.1. Verification Result Data . . . . . . . . . . . . . . 152 11.5.2. Verification Result Elements . . . . . . . . . . . . 153 11.6. START-SESSION . . . . . . . . . . . . . . . . . . . . . 157 11.7. END-SESSION . . . . . . . . . . . . . . . . . . . . . . 158 11.8. QUERY-VOICEPRINT . . . . . . . . . . . . . . . . . . . . 159
10.4.6. Failed-URI-Cause . . . . . . . . . . . . . . . . . . 132 10.4.7. Record-URI . . . . . . . . . . . . . . . . . . . . . 132 10.4.8. Media-Type . . . . . . . . . . . . . . . . . . . . . 133 10.4.9. Max-Time . . . . . . . . . . . . . . . . . . . . . . 133 10.4.10. Trim-Length . . . . . . . . . . . . . . . . . . . . 134 10.4.11. Final-Silence . . . . . . . . . . . . . . . . . . . 134 10.4.12. Capture-On-Speech . . . . . . . . . . . . . . . . . 134 10.4.13. Ver-Buffer-Utterance . . . . . . . . . . . . . . . . 134 10.4.14. Start-Input-Timers . . . . . . . . . . . . . . . . . 135 10.4.15. New-Audio-Channel . . . . . . . . . . . . . . . . . 135 10.5. Recorder Message Body . . . . . . . . . . . . . . . . . 135 10.6. RECORD . . . . . . . . . . . . . . . . . . . . . . . . . 135 10.7. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 136 10.8. RECORD-COMPLETE . . . . . . . . . . . . . . . . . . . . 137 10.9. START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 138 10.10. START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 138 11. Speaker Verification and Identification . . . . . . . . . . . 139 11.1. Speaker Verification State Machine . . . . . . . . . . . 140 11.2. Speaker Verification Methods . . . . . . . . . . . . . . 142 11.3. Verification Events . . . . . . . . . . . . . . . . . . 144 11.4. Verification Header Fields . . . . . . . . . . . . . . . 144 11.4.1. Repository-URI . . . . . . . . . . . . . . . . . . . 144 11.4.2. Voiceprint-Identifier . . . . . . . . . . . . . . . 145 11.4.3. Verification-Mode . . . . . . . . . . . . . . . . . 145 11.4.4. Adapt-Model . . . . . . . . . . . . . . . . . . . . 146 11.4.5. Abort-Model . . . . . . . . . . . . . . . . . . . . 146 11.4.6. Min-Verification-Score . . . . . . . . . . . . . . . 147 11.4.7. Num-Min-Verification-Phrases . . . . . . . . . . . . 147 11.4.8. Num-Max-Verification-Phrases . . . . . . . . . . . . 147 11.4.9. No-Input-Timeout . . . . . . . . . . . . . . . . . . 148 11.4.10. Save-Waveform . . . . . . . . . . . . . . . . . . . 148 11.4.11. Media-Type . . . . . . . . . . . . . . . . . . . . . 148 11.4.12. Waveform-URI . . . . . . . . . . . . . . . . . . . . 148 11.4.13. Voiceprint-Exists . . . . . . . . . . . . . . . . . 149 11.4.14. Ver-Buffer-Utterance . . . . . . . . . . . . . . . . 149 11.4.15. Input-Waveform-URI . . . . . . . . . . . . . . . . . 149 11.4.16. Completion-Cause . . . . . . . . . . . . . . . . . . 150 11.4.17. Completion-Reason . . . . . . . . . . . . . . . . . 151 11.4.18. Speech-Complete-Timeout . . . . . . . . . . . . . . 151 11.4.19. New-Audio-Channel . . . . . . . . . . . . . . . . . 152 11.4.20. Abort-Verification . . . . . . . . . . . . . . . . . 152 11.4.21. Start-Input-Timers . . . . . . . . . . . . . . . . . 152 11.5. Verification Message Body . . . . . . . . . . . . . . . 152 11.5.1. Verification Result Data . . . . . . . . . . . . . . 152 11.5.2. Verification Result Elements . . . . . . . . . . . . 153 11.6. START-SESSION . . . . . . . . . . . . . . . . . . . . . 157 11.7. END-SESSION . . . . . . . . . . . . . . . . . . . . . . 158 11.8. QUERY-VOICEPRINT . . . . . . . . . . . . . . . . . . . . 159
11.9. DELETE-VOICEPRINT . . . . . . . . . . . . . . . . . . . 160 11.10. VERIFY . . . . . . . . . . . . . . . . . . . . . . . . . 160 11.11. VERIFY-FROM-BUFFER . . . . . . . . . . . . . . . . . . . 160 11.12. VERIFY-ROLLBACK . . . . . . . . . . . . . . . . . . . . 164 11.13. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 164 11.14. START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 165 11.15. VERIFICATION-COMPLETE . . . . . . . . . . . . . . . . . 165 11.16. START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 166 11.17. CLEAR-BUFFER . . . . . . . . . . . . . . . . . . . . . . 166 11.18. GET-INTERMEDIATE-RESULT . . . . . . . . . . . . . . . . 167 12. Security Considerations . . . . . . . . . . . . . . . . . . . 168 12.1. Rendezvous and Session Establishment . . . . . . . . . . 168 12.2. Control Channel Protection . . . . . . . . . . . . . . . 168 12.3. Media Session Protection . . . . . . . . . . . . . . . . 169 12.4. Indirect Content Access . . . . . . . . . . . . . . . . 169 12.5. Protection of Stored Media . . . . . . . . . . . . . . . 170 12.6. DTMF and Recognition Buffers . . . . . . . . . . . . . . 171 12.7. Client-Set Server Parameters . . . . . . . . . . . . . . 171 12.8. DELETE-VOICEPRINT and Authorization . . . . . . . . . . 171 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 171 13.1. New Registries . . . . . . . . . . . . . . . . . . . . . 171 13.1.1. MRCPv2 Resource Types . . . . . . . . . . . . . . . 171 13.1.2. MRCPv2 Methods and Events . . . . . . . . . . . . . 172 13.1.3. MRCPv2 Header Fields . . . . . . . . . . . . . . . . 173 13.1.4. MRCPv2 Status Codes . . . . . . . . . . . . . . . . 176 13.1.5. Grammar Reference List Parameters . . . . . . . . . 176 13.1.6. MRCPv2 Vendor-Specific Parameters . . . . . . . . . 176 13.2. NLSML-Related Registrations . . . . . . . . . . . . . . 177 13.2.1. 'application/nlsml+xml' Media Type Registration . . 177 13.3. NLSML XML Schema Registration . . . . . . . . . . . . . 178 13.4. MRCPv2 XML Namespace Registration . . . . . . . . . . . 178 13.5. Text Media Type Registrations . . . . . . . . . . . . . 178 13.5.1. text/grammar-ref-list . . . . . . . . . . . . . . . 178 13.6. 'session' URI Scheme Registration . . . . . . . . . . . 180 13.7. SDP Parameter Registrations . . . . . . . . . . . . . . 181 13.7.1. Sub-Registry "proto" . . . . . . . . . . . . . . . . 181 13.7.2. Sub-Registry "att-field (media-level)" . . . . . . . 182 14. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 183 14.1. Message Flow . . . . . . . . . . . . . . . . . . . . . . 183 14.2. Recognition Result Examples . . . . . . . . . . . . . . 192 14.2.1. Simple ASR Ambiguity . . . . . . . . . . . . . . . . 192 14.2.2. Mixed Initiative . . . . . . . . . . . . . . . . . . 192 14.2.3. DTMF Input . . . . . . . . . . . . . . . . . . . . . 193 14.2.4. Interpreting Meta-Dialog and Meta-Task Utterances . 194 14.2.5. Anaphora and Deixis . . . . . . . . . . . . . . . . 195 14.2.6. Distinguishing Individual Items from Sets with One Member . . . . . . . . . . . . . . . . . . . . . 195 14.2.7. Extensibility . . . . . . . . . . . . . . . . . . . 196
11.9. DELETE-VOICEPRINT . . . . . . . . . . . . . . . . . . . 160 11.10. VERIFY . . . . . . . . . . . . . . . . . . . . . . . . . 160 11.11. VERIFY-FROM-BUFFER . . . . . . . . . . . . . . . . . . . 160 11.12. VERIFY-ROLLBACK . . . . . . . . . . . . . . . . . . . . 164 11.13. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 164 11.14. START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 165 11.15. VERIFICATION-COMPLETE . . . . . . . . . . . . . . . . . 165 11.16. START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 166 11.17. CLEAR-BUFFER . . . . . . . . . . . . . . . . . . . . . . 166 11.18. GET-INTERMEDIATE-RESULT . . . . . . . . . . . . . . . . 167 12. Security Considerations . . . . . . . . . . . . . . . . . . . 168 12.1. Rendezvous and Session Establishment . . . . . . . . . . 168 12.2. Control Channel Protection . . . . . . . . . . . . . . . 168 12.3. Media Session Protection . . . . . . . . . . . . . . . . 169 12.4. Indirect Content Access . . . . . . . . . . . . . . . . 169 12.5. Protection of Stored Media . . . . . . . . . . . . . . . 170 12.6. DTMF and Recognition Buffers . . . . . . . . . . . . . . 171 12.7. Client-Set Server Parameters . . . . . . . . . . . . . . 171 12.8. DELETE-VOICEPRINT and Authorization . . . . . . . . . . 171 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 171 13.1. New Registries . . . . . . . . . . . . . . . . . . . . . 171 13.1.1. MRCPv2 Resource Types . . . . . . . . . . . . . . . 171 13.1.2. MRCPv2 Methods and Events . . . . . . . . . . . . . 172 13.1.3. MRCPv2 Header Fields . . . . . . . . . . . . . . . . 173 13.1.4. MRCPv2 Status Codes . . . . . . . . . . . . . . . . 176 13.1.5. Grammar Reference List Parameters . . . . . . . . . 176 13.1.6. MRCPv2 Vendor-Specific Parameters . . . . . . . . . 176 13.2. NLSML-Related Registrations . . . . . . . . . . . . . . 177 13.2.1. 'application/nlsml+xml' Media Type Registration . . 177 13.3. NLSML XML Schema Registration . . . . . . . . . . . . . 178 13.4. MRCPv2 XML Namespace Registration . . . . . . . . . . . 178 13.5. Text Media Type Registrations . . . . . . . . . . . . . 178 13.5.1. text/grammar-ref-list . . . . . . . . . . . . . . . 178 13.6. 'session' URI Scheme Registration . . . . . . . . . . . 180 13.7. SDP Parameter Registrations . . . . . . . . . . . . . . 181 13.7.1. Sub-Registry "proto" . . . . . . . . . . . . . . . . 181 13.7.2. Sub-Registry "att-field (media-level)" . . . . . . . 182 14. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 183 14.1. Message Flow . . . . . . . . . . . . . . . . . . . . . . 183 14.2. Recognition Result Examples . . . . . . . . . . . . . . 192 14.2.1. Simple ASR Ambiguity . . . . . . . . . . . . . . . . 192 14.2.2. Mixed Initiative . . . . . . . . . . . . . . . . . . 192 14.2.3. DTMF Input . . . . . . . . . . . . . . . . . . . . . 193 14.2.4. Interpreting Meta-Dialog and Meta-Task Utterances . 194 14.2.5. Anaphora and Deixis . . . . . . . . . . . . . . . . 195 14.2.6. Distinguishing Individual Items from Sets with One Member . . . . . . . . . . . . . . . . . . . . . 195 14.2.7. Extensibility . . . . . . . . . . . . . . . . . . . 196
15. ABNF Normative Definition . . . . . . . . . . . . . . . . . . 196 16. XML Schemas . . . . . . . . . . . . . . . . . . . . . . . . . 211 16.1. NLSML Schema Definition . . . . . . . . . . . . . . . . 211 16.2. Enrollment Results Schema Definition . . . . . . . . . . 213 16.3. Verification Results Schema Definition . . . . . . . . . 214 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 218 17.1. Normative References . . . . . . . . . . . . . . . . . . 218 17.2. Informative References . . . . . . . . . . . . . . . . . 220 Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 223 Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 223
15. ABNF Normative Definition . . . . . . . . . . . . . . . . . . 196 16. XML Schemas . . . . . . . . . . . . . . . . . . . . . . . . . 211 16.1. NLSML Schema Definition . . . . . . . . . . . . . . . . 211 16.2. Enrollment Results Schema Definition . . . . . . . . . . 213 16.3. Verification Results Schema Definition . . . . . . . . . 214 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 218 17.1. Normative References . . . . . . . . . . . . . . . . . . 218 17.2. Informative References . . . . . . . . . . . . . . . . . 220 Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 223 Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 223
MRCPv2 is designed to allow a client device to control media processing resources on the network. Some of these media processing resources include speech recognition engines, speech synthesis engines, speaker verification, and speaker identification engines. MRCPv2 enables the implementation of distributed Interactive Voice Response platforms using VoiceXML [W3C.REC-voicexml20-20040316] browsers or other client applications while maintaining separate back-end speech processing capabilities on specialized speech processing servers. MRCPv2 is based on the earlier Media Resource Control Protocol (MRCP) [RFC4463] developed jointly by Cisco Systems, Inc., Nuance Communications, and Speechworks, Inc. Although some of the method names are similar, the way in which these methods are communicated is different. There are also more resources and more methods for each resource. The first version of MRCP was essentially taken only as input to the development of this protocol. There is no expectation that an MRCPv2 client will work with an MRCPv1 server or vice versa. There is no migration plan or gateway definition between the two protocols.
MRCPv2旨在允许客户端设备控制网络上的媒体处理资源。其中一些媒体处理资源包括语音识别引擎、语音合成引擎、说话人验证和说话人识别引擎。MRCPv2支持使用VoiceXML[W3C.REC-voicexml20-20040316]浏览器或其他客户端应用程序实现分布式交互式语音响应平台,同时在专用语音处理服务器上维护单独的后端语音处理功能。MRCPv2基于由Cisco Systems,Inc.,Nuance Communications和Speechworks,Inc.联合开发的早期媒体资源控制协议(MRCP)[RFC4463]。尽管一些方法名称相似,但这些方法的通信方式不同。每个资源也有更多的资源和更多的方法。MRCP的第一个版本基本上只是作为制定本协议的输入。不期望MRCPv2客户机与MRCPv1服务器一起工作,反之亦然。这两个协议之间没有迁移计划或网关定义。
The protocol requirements of Speech Services Control (SPEECHSC) [RFC4313] include that the solution be capable of reaching a media processing server, setting up communication channels to the media resources, and sending and receiving control messages and media streams to/from the server. The Session Initiation Protocol (SIP) [RFC3261] meets these requirements.
语音服务控制(SPEECHSC)[RFC4313]的协议要求包括该解决方案能够到达媒体处理服务器,建立到媒体资源的通信信道,以及向服务器发送和接收控制消息和媒体流。会话启动协议(SIP)[RFC3261]满足这些要求。
The proprietary version of MRCP ran over the Real Time Streaming Protocol (RTSP) [RFC2326]. At the time work on MRCPv2 was begun, the consensus was that this use of RTSP would break the RTSP protocol or cause backward-compatibility problems, something forbidden by Section 3.2 of [RFC4313]. This is the reason why MRCPv2 does not run over RTSP.
MRCP的专有版本通过实时流协议(RTSP)[RFC2326]运行。在开始研究MRCPv2时,一致认为使用RTSP会破坏RTSP协议或导致向后兼容性问题,这是[RFC4313]第3.2节所禁止的。这就是为什么MRCPv2不在RTSP上运行的原因。
MRCPv2 leverages these capabilities by building upon SIP and the Session Description Protocol (SDP) [RFC4566]. MRCPv2 uses SIP to set up and tear down media and control sessions with the server. In addition, the client can use a SIP re-INVITE method (an INVITE dialog sent within an existing SIP session) to change the characteristics of these media and control session while maintaining the SIP dialog between the client and server. SDP is used to describe the parameters of the media sessions associated with that dialog. It is mandatory to support SIP as the session establishment protocol to ensure interoperability. Other protocols can be used for session establishment by prior agreement. This document only describes the use of SIP and SDP.
MRCPv2通过构建SIP和会话描述协议(SDP)[RFC4566]来利用这些功能。MRCPv2使用SIP设置和中断与服务器的媒体和控制会话。此外,客户端可以使用SIP重新邀请方法(在现有SIP会话中发送的邀请对话框)来更改这些媒体和控制会话的特征,同时维护客户端和服务器之间的SIP对话框。SDP用于描述与该对话框关联的媒体会话的参数。必须支持SIP作为会话建立协议,以确保互操作性。通过事先协商,可以使用其他协议建立会话。本文档仅描述SIP和SDP的使用。
MRCPv2 uses SIP and SDP to create the speech client/server dialog and set up the media channels to the server. It also uses SIP and SDP to establish MRCPv2 control sessions between the client and the server for each media processing resource required for that dialog. The MRCPv2 protocol exchange between the client and the media resource is carried on that control session. MRCPv2 exchanges do not change the state of the SIP dialog, the media sessions, or other parameters of the dialog initiated via SIP. It controls and affects the state of the media processing resource associated with the MRCPv2 session(s).
MRCPv2使用SIP和SDP创建语音客户端/服务器对话框,并设置到服务器的媒体通道。它还使用SIP和SDP为该对话框所需的每个媒体处理资源在客户端和服务器之间建立MRCPv2控制会话。客户机和媒体资源之间的MRCPv2协议交换在该控制会话中进行。MRCPv2交换不会更改SIP对话的状态、媒体会话或通过SIP启动的对话的其他参数。它控制并影响与MRCPv2会话关联的媒体处理资源的状态。
MRCPv2 defines the messages to control the different media processing resources and the state machines required to guide their operation. It also describes how these messages are carried over a transport-layer protocol such as the Transmission Control Protocol (TCP) [RFC0793] or the Transport Layer Security (TLS) Protocol [RFC5246]. (Note: the Stream Control Transmission Protocol (SCTP) [RFC4960] is a viable transport for MRCPv2 as well, but the mapping onto SCTP is not described in this specification.)
MRCPv2定义用于控制不同媒体处理资源的消息,以及指导其操作所需的状态机。它还描述了如何通过传输层协议(如传输控制协议(TCP)[RFC0793]或传输层安全(TLS)协议[RFC5246])传输这些消息。(注:流控制传输协议(SCTP)[RFC4960]也是MRCPv2的可行传输协议,但本规范中未描述到SCTP的映射。)
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照RFC 2119[RFC2119]中所述进行解释。
Since many of the definitions and syntax are identical to those for the Hypertext Transfer Protocol -- HTTP/1.1 [RFC2616], this specification refers to the section where they are defined rather than copying it. For brevity, [HX.Y] is to be taken to refer to Section X.Y of RFC 2616.
由于许多定义和语法与超文本传输协议(HTTP/1.1[RFC2616])的定义和语法相同,因此本规范引用了定义它们的部分,而不是复制它们。为简洁起见,[HX.Y]应参考RFC 2616第X.Y节。
All the mechanisms specified in this document are described in both prose and an augmented Backus-Naur form (ABNF [RFC5234]).
本文件中规定的所有机制均以散文和增广的巴科斯诺尔形式(ABNF[RFC5234])进行了描述。
The complete message format in ABNF form is provided in Section 15 and is the normative format definition. Note that productions may be duplicated within the main body of the document for reading convenience. If a production in the body of the text conflicts with one in the normative definition, the latter rules.
ABNF格式的完整信息格式见第15节,是标准格式定义。请注意,为了便于阅读,可以在文档主体内复制产品。如果文本正文中的产品与规范性定义中的产品相冲突,则后一种定义为规则。
Media Resource An entity on the speech processing server that can be controlled through MRCPv2.
媒体资源语音处理服务器上可通过MRCPv2控制的实体。
MRCP Server Aggregate of one or more "Media Resource" entities on a server, exposed through MRCPv2. Often, 'server' in this document refers to an MRCP server.
MRCP服务器服务器服务器上一个或多个“媒体资源”实体的集合,通过MRCPv2公开。本文档中的“服务器”通常指MRCP服务器。
MRCP Client An entity controlling one or more Media Resources through MRCPv2 ("Client" for short).
MRCP客户端通过MRCPv2(简称“客户端”)控制一个或多个媒体资源的实体。
DTMF Dual-Tone Multi-Frequency; a method of transmitting key presses in-band, either as actual tones (Q.23 [Q.23]) or as named tone events (RFC 4733 [RFC4733]).
双音多频;一种在频带内传输按键的方法,可以作为实际音调(Q.23[Q.23])或命名音调事件(RFC 4733[RFC4733])。
Endpointing The process of automatically detecting the beginning and end of speech in an audio stream. This is critical both for speech recognition and for automated recording as one would find in voice mail systems.
端点自动检测音频流中语音的开始和结束的过程。这对于语音识别和语音邮件系统中的自动录音都至关重要。
Hotword Mode A mode of speech recognition where a stream of utterances is evaluated for match against a small set of command words. This is generally employed either to trigger some action or to control the subsequent grammar to be used for further recognition.
热词模式语音识别的一种模式,在这种模式下,对一系列话语进行评估,以与一小部分命令词进行匹配。这通常用于触发某些动作或控制后续语法以用于进一步识别。
The state-machine diagrams in this document do not show every possible method call. Rather, they reflect the state of the resource based on the methods that have moved to IN-PROGRESS or COMPLETE states (see Section 5.3). Note that since PENDING requests essentially have not affected the resource yet and are in the queue to be processed, they are not reflected in the state-machine diagrams.
本文档中的状态机图并没有显示所有可能的方法调用。相反,它们反映了基于已转移到进行中或完成状态的方法的资源状态(见第5.3节)。请注意,由于挂起的请求基本上还没有影响到资源,并且在要处理的队列中,因此它们不会反映在状态机关系图中。
This document defines many protocol headers that contain URIs (Uniform Resource Identifiers [RFC3986]) or lists of URIs for referencing media. The entire document, including the Security Considerations section (Section 12), assumes that HTTP or HTTP over TLS (HTTPS) [RFC2818] will be used as the URI addressing scheme unless otherwise stated. However, implementations MAY support other schemes (such as 'file'), provided they have addressed any security considerations described in this document and any others particular to the specific scheme. For example, implementations where the client and server both reside on the same physical hardware and the file system is secured by traditional user-level file access controls could be reasonable candidates for supporting the 'file' scheme.
本文档定义了许多包含URI(统一资源标识符[RFC3986])或引用媒体的URI列表的协议头。整个文档,包括安全注意事项部分(第12节),假设HTTP或HTTP over TLS(HTTPS)[RFC2818]将用作URI寻址方案,除非另有说明。但是,实现可能支持其他方案(如“文件”),前提是它们已经解决了本文档中描述的任何安全注意事项以及特定方案的任何其他特定注意事项。例如,客户机和服务器都位于同一物理硬件上,并且文件系统由传统的用户级文件访问控制进行保护的实现可能是支持“文件”方案的合理候选方案。
A system using MRCPv2 consists of a client that requires the generation and/or consumption of media streams and a media resource server that has the resources or "engines" to process these streams as input or generate these streams as output. The client uses SIP and SDP to establish an MRCPv2 control channel with the server to use its media processing resources. MRCPv2 servers are addressed using SIP URIs.
使用MRCPv2的系统包括一个需要生成和/或使用媒体流的客户端和一个媒体资源服务器,该服务器具有将这些流作为输入进行处理或生成这些流作为输出的资源或“引擎”。客户端使用SIP和SDP与服务器建立MRCPv2控制通道,以使用其媒体处理资源。MRCPv2服务器使用SIPURI寻址。
SIP uses SDP with the offer/answer model described in RFC 3264 [RFC3264] to set up the MRCPv2 control channels and describe their characteristics. A separate MRCPv2 session is needed to control each of the media processing resources associated with the SIP dialog between the client and server. Within a SIP dialog, the individual resource control channels for the different resources are added or removed through SDP offer/answer carried in a SIP re-INVITE transaction.
SIP使用SDP和RFC 3264[RFC3264]中描述的提供/应答模型来设置MRCPv2控制信道并描述其特性。需要一个单独的MRCPv2会话来控制与客户端和服务器之间的SIP对话框相关联的每个媒体处理资源。在SIP对话框中,通过SIP重新邀请事务中携带的SDP offer/answer添加或删除不同资源的各个资源控制通道。
The server, through the SDP exchange, provides the client with a difficult-to-guess, unambiguous channel identifier and a TCP port number (see Section 4.2). The client MAY then open a new TCP connection with the server on this port number. Multiple MRCPv2 channels can share a TCP connection between the client and the server. All MRCPv2 messages exchanged between the client and the server carry the specified channel identifier that the server MUST ensure is unambiguous among all MRCPv2 control channels that are active on that server. The client uses this channel identifier to indicate the media processing resource associated with that channel. For information on message framing, see Section 5.
服务器通过SDP交换向客户机提供难以猜测、明确的通道标识符和TCP端口号(见第4.2节)。然后,客户端可以在此端口号上打开与服务器的新TCP连接。多个MRCPv2通道可以在客户端和服务器之间共享TCP连接。客户端和服务器之间交换的所有MRCPv2消息都带有指定的通道标识符,服务器必须确保该标识符在该服务器上活动的所有MRCPv2控制通道中是明确的。客户端使用此通道标识符指示与该通道关联的媒体处理资源。有关消息框架的信息,请参见第5节。
SIP also establishes the media sessions between the client (or other source/sink of media) and the MRCPv2 server using SDP "m=" lines.
SIP还使用SDP“m=”行在客户端(或其他媒体源/接收器)和MRCPv2服务器之间建立媒体会话。
One or more media processing resources may share a media session under a SIP session, or each media processing resource may have its own media session.
一个或多个媒体处理资源可以在SIP会话下共享媒体会话,或者每个媒体处理资源可以具有其自己的媒体会话。
The following diagram shows the general architecture of a system that uses MRCPv2. To simplify the diagram, only a few resources are shown.
下图显示了使用MRCPv2的系统的一般架构。为了简化图表,只显示了一些资源。
MRCPv2 client MRCPv2 Media Resource Server |--------------------| |------------------------------------| ||------------------|| ||----------------------------------|| || Application Layer|| ||Synthesis|Recognition|Verification|| ||------------------|| || Engine | Engine | Engine || ||Media Resource API|| || || | || | || || ||------------------|| ||Synthesis|Recognizer | Verifier || || SIP | MRCPv2 || ||Resource | Resource | Resource || ||Stack | || || Media Resource Management || || | || ||----------------------------------|| ||------------------|| || SIP | MRCPv2 || || TCP/IP Stack ||---MRCPv2---|| Stack | || || || ||----------------------------------|| ||------------------||----SIP-----|| TCP/IP Stack || |--------------------| || || | ||----------------------------------|| SIP |------------------------------------| | / |-------------------| RTP | | / | Media Source/Sink |------------/ | | |-------------------|
MRCPv2 client MRCPv2 Media Resource Server |--------------------| |------------------------------------| ||------------------|| ||----------------------------------|| || Application Layer|| ||Synthesis|Recognition|Verification|| ||------------------|| || Engine | Engine | Engine || ||Media Resource API|| || || | || | || || ||------------------|| ||Synthesis|Recognizer | Verifier || || SIP | MRCPv2 || ||Resource | Resource | Resource || ||Stack | || || Media Resource Management || || | || ||----------------------------------|| ||------------------|| || SIP | MRCPv2 || || TCP/IP Stack ||---MRCPv2---|| Stack | || || || ||----------------------------------|| ||------------------||----SIP-----|| TCP/IP Stack || |--------------------| || || | ||----------------------------------|| SIP |------------------------------------| | / |-------------------| RTP | | / | Media Source/Sink |------------/ | | |-------------------|
Figure 1: Architectural Diagram
图1:架构图
An MRCPv2 server may offer one or more of the following media processing resources to its clients.
MRCPv2服务器可以向其客户端提供以下一个或多个媒体处理资源。
Basic Synthesizer A speech synthesizer resource that has very limited capabilities and can generate its media stream exclusively from concatenated audio clips. The speech data is described using a limited subset of the Speech Synthesis Markup Language (SSML) [W3C.REC-speech-synthesis-20040907] elements. A basic synthesizer MUST support the SSML tags <speak>, <audio>, <say-as>, and <mark>.
基本合成器一种语音合成器资源,其功能非常有限,只能从连接的音频剪辑生成媒体流。语音数据使用语音合成标记语言(SSML)[W3C.REC-speech-Synthesis-20040907]元素的有限子集来描述。基本合成器必须支持SSML标签<speak>、<audio>、<say as>和<mark>。
Speech Synthesizer A full-capability speech synthesis resource that can render speech from text. Such a synthesizer MUST have full SSML [W3C.REC-speech-synthesis-20040907] support.
语音合成器一种全功能语音合成资源,可以从文本中呈现语音。这种合成器必须具有完整的SSML[W3C.REC-speech-synthesis-20040907]支持。
Recorder A resource capable of recording audio and providing a URI pointer to the recording. A recorder MUST provide endpointing capabilities for suppressing silence at the beginning and end of a recording, and MAY also suppress silence in the middle of a recording. If such suppression is done, the recorder MUST maintain timing metadata to indicate the actual timestamps of the recorded media.
记录器能够录制音频并提供指向录制的URI指针的资源。记录器必须提供在记录开始和结束时抑制静默的端点能力,并且还可以抑制记录中间的静默。如果进行了这种抑制,记录器必须维护定时元数据,以指示所记录媒体的实际时间戳。
DTMF Recognizer A recognizer resource capable of extracting and interpreting Dual-Tone Multi-Frequency (DTMF) [Q.23] digits in a media stream and matching them against a supplied digit grammar. It could also do a semantic interpretation based on semantic tags in the grammar.
DTMF识别器一种识别器资源,能够提取和解释媒体流中的双音多频(DTMF)[Q.23]数字,并根据提供的数字语法进行匹配。它还可以根据语法中的语义标记进行语义解释。
Speech Recognizer A full speech recognition resource that is capable of receiving a media stream containing audio and interpreting it to recognition results. It also has a natural language semantic interpreter to post-process the recognized data according to the semantic data in the grammar and provide semantic results along with the recognized input. The recognizer MAY also support enrolled grammars, where the client can enroll and create new personal grammars for use in future recognition operations.
语音识别器能够接收包含音频的媒体流并将其解释为识别结果的完整语音识别资源。它还具有一个自然语言语义解释器,根据语法中的语义数据对识别数据进行后处理,并在识别输入的同时提供语义结果。识别器还可以支持已注册的语法,其中客户端可以注册并创建新的个人语法,以便在将来的识别操作中使用。
Speaker Verifier A resource capable of verifying the authenticity of a claimed identity by matching a media stream containing spoken input to a pre-existing voiceprint. This may also involve matching the caller's voice against more than one voiceprint, also called multi-verification or speaker identification.
说话人验证器能够通过将包含语音输入的媒体流与预先存在的声纹匹配来验证所声明身份的真实性的资源。这还可能涉及将呼叫者的声音与多个声纹进行匹配,也称为多重验证或说话人识别。
The MRCPv2 server is a generic SIP server, and is thus addressed by a SIP URI (RFC 3261 [RFC3261]).
MRCPv2服务器是通用SIP服务器,因此由SIPURI(RFC33261[RFC3261])寻址。
For example:
例如:
sip:mrcpv2@example.net or sips:mrcpv2@example.net
sip:mrcpv2@example.net or sips:mrcpv2@example.net
MRCPv2 requires a connection-oriented transport-layer protocol such as TCP to guarantee reliable sequencing and delivery of MRCPv2 control messages between the client and the server. In order to meet the requirements for security enumerated in SPEECHSC requirements [RFC4313], clients and servers MUST implement TLS as well. One or more connections between the client and the server can be shared among different MRCPv2 channels to the server. The individual messages carry the channel identifier to differentiate messages on different channels. MRCPv2 encoding is text based with mechanisms to carry embedded binary data. This allows arbitrary data like recognition grammars, recognition results, synthesizer speech markup, etc., to be carried in MRCPv2 messages. For information on message framing, see Section 5.
MRCPv2需要一个面向连接的传输层协议,如TCP,以保证客户机和服务器之间MRCPv2控制消息的可靠排序和传递。为了满足SPEECHSC要求[RFC4313]中列举的安全性要求,客户机和服务器也必须实现TLS。客户端和服务器之间的一个或多个连接可以在到服务器的不同MRCPv2通道之间共享。单个消息带有通道标识符,以区分不同通道上的消息。MRCPv2编码是基于文本的,具有携带嵌入式二进制数据的机制。这允许在MRCPv2消息中携带任意数据,如识别语法、识别结果、合成器语音标记等。有关消息框架的信息,请参见第5节。
MRCPv2 employs SIP, in conjunction with SDP, as the session establishment and management protocol. The client reaches an MRCPv2 server using conventional INVITE and other SIP requests for establishing, maintaining, and terminating SIP dialogs. The SDP offer/answer exchange model over SIP is used to establish a resource control channel for each resource. The SDP offer/answer exchange is also used to establish media sessions between the server and the source or sink of audio.
MRCPv2使用SIP和SDP作为会话建立和管理协议。客户端使用传统的INVITE和其他SIP请求到达MRCPv2服务器,以建立、维护和终止SIP对话框。SIP上的SDP提供/应答交换模型用于为每个资源建立资源控制通道。SDP提供/应答交换还用于在服务器和音频源或接收器之间建立媒体会话。
The client needs a separate MRCPv2 resource control channel to control each media processing resource under the SIP dialog. A unique channel identifier string identifies these resource control channels. The channel identifier is a difficult-to-guess, unambiguous string followed by an "@", then by a string token specifying the type of resource. The server generates the channel identifier and MUST make sure it does not clash with the identifier of any other MRCP channel currently allocated by that server. MRCPv2 defines the following IANA-registered types of media processing
客户端需要一个单独的MRCPv2资源控制通道来控制SIP对话框下的每个媒体处理资源。唯一的通道标识符字符串标识这些资源控制通道。通道标识符是一个难以猜测的明确字符串,后跟“@”,然后是一个指定资源类型的字符串标记。服务器生成通道标识符,并且必须确保它不会与该服务器当前分配的任何其他MRCP通道的标识符冲突。MRCPv2定义了以下IANA注册的媒体处理类型
resources. Additional resource types and their associated methods/ events and state machines may be added as described below in Section 13.
资源。如下文第13节所述,可添加其他资源类型及其相关方法/事件和状态机。
+---------------+----------------------+--------------+ | Resource Type | Resource Description | Described in | +---------------+----------------------+--------------+ | speechrecog | Speech Recognizer | Section 9 | | dtmfrecog | DTMF Recognizer | Section 9 | | speechsynth | Speech Synthesizer | Section 8 | | basicsynth | Basic Synthesizer | Section 8 | | speakverify | Speaker Verification | Section 11 | | recorder | Speech Recorder | Section 10 | +---------------+----------------------+--------------+
+---------------+----------------------+--------------+ | Resource Type | Resource Description | Described in | +---------------+----------------------+--------------+ | speechrecog | Speech Recognizer | Section 9 | | dtmfrecog | DTMF Recognizer | Section 9 | | speechsynth | Speech Synthesizer | Section 8 | | basicsynth | Basic Synthesizer | Section 8 | | speakverify | Speaker Verification | Section 11 | | recorder | Speech Recorder | Section 10 | +---------------+----------------------+--------------+
Table 1: Resource Types
表1:资源类型
The SIP INVITE or re-INVITE transaction and the SDP offer/answer exchange it carries contain "m=" lines describing the resource control channel to be allocated. There MUST be one SDP "m=" line for each MRCPv2 resource to be used in the session. This "m=" line MUST have a media type field of "application" and a transport type field of either "TCP/MRCPv2" or "TCP/TLS/MRCPv2". The port number field of the "m=" line MUST contain the "discard" port of the transport protocol (port 9 for TCP) in the SDP offer from the client and MUST contain the TCP listen port on the server in the SDP answer. The client may then either set up a TCP or TLS connection to that server port or share an already established connection to that port. Since MRCPv2 allows multiple sessions to share the same TCP connection, multiple "m=" lines in a single SDP document MAY share the same port field value; MRCPv2 servers MUST NOT assume any relationship between resources using the same port other than the sharing of the communication channel.
SIP INVITE或REINVITE事务及其承载的SDP提供/应答交换包含描述要分配的资源控制通道的“m=”行。会话中使用的每个MRCPv2资源必须有一个SDP“m=”行。此“m=”行的媒体类型字段必须为“应用程序”,传输类型字段必须为“TCP/MRCPv2”或“TCP/TLS/MRCPv2”。“m=”行的端口号字段必须包含来自客户端的SDP报价中传输协议的“丢弃”端口(TCP的端口9),并且必须包含SDP应答中服务器上的TCP侦听端口。然后,客户端可以设置到该服务器端口的TCP或TLS连接,或者共享到该端口的已建立连接。由于MRCPv2允许多个会话共享相同的TCP连接,因此单个SDP文档中的多个“m=”行可能共享相同的端口字段值;MRCPv2服务器不得假设使用相同端口的资源之间存在任何关系,通信通道共享除外。
MRCPv2 resources do not use the port or format field of the "m=" line to distinguish themselves from other resources using the same channel. The client MUST specify the resource type identifier in the resource attribute associated with the control "m=" line of the SDP offer. The server MUST respond with the full Channel-Identifier (which includes the resource type identifier and a difficult-to-guess, unambiguous string) in the "channel" attribute associated with the control "m=" line of the SDP answer. To remain backwards compatible with conventional SDP usage, the format field of the "m=" line MUST have the arbitrarily selected value of "1".
MRCPv2资源不使用“m=”行的“端口”或“格式”字段来区别于使用相同通道的其他资源。客户机必须在与SDP报价的control“m=”行关联的资源属性中指定资源类型标识符。服务器必须在与SDP应答的控件“m=”行关联的“Channel”属性中使用完整的通道标识符(包括资源类型标识符和难以猜测的明确字符串)进行响应。为了与传统SDP用法保持向后兼容,“m=”行的格式字段必须具有任意选择的值“1”。
When the client wants to add a media processing resource to the session, it issues a new SDP offer, according to the procedures of RFC 3264 [RFC3264], in a SIP re-INVITE request. The SDP offer/answer
当客户端想要向会话添加媒体处理资源时,它会根据RFC 3264[RFC3264]的过程在SIP重新邀请请求中发出新的SDP提议。SDP的提议/答复
exchange carried by this SIP transaction contains one or more additional control "m=" lines for the new resources to be allocated to the session. The server, on seeing the new "m=" line, allocates the resources (if they are available) and responds with a corresponding control "m=" line in the SDP answer carried in the SIP response. If the new resources are not available, the re-INVITE receives an error message, and existing media processing going on before the re-INVITE will continue as it was before. It is not possible to allocate more than one resource of each type. If a client requests more than one resource of any type, the server MUST behave as if the resources of that type (beyond the first one) are not available.
此SIP事务携带的exchange包含一个或多个附加控制“m=”行,用于分配给会话的新资源。服务器在看到新的“m=”行时,分配资源(如果资源可用),并在SIP响应中携带的SDP应答中使用相应的控件“m=”行进行响应。如果新资源不可用,重新邀请将收到一条错误消息,重新邀请之前正在进行的现有媒体处理将一如既往地继续进行。不可能为每种类型分配多个资源。如果客户机请求任何类型的多个资源,则服务器的行为必须与该类型的资源(第一个类型以外的资源)不可用一样。
MRCPv2 clients and servers using TCP as a transport protocol MUST use the procedures specified in RFC 4145 [RFC4145] for setting up the TCP connection, with the considerations described hereby. Similarly, MRCPv2 clients and servers using TCP/TLS as a transport protocol MUST use the procedures specified in RFC 4572 [RFC4572] for setting up the TLS connection, with the considerations described hereby. The a=setup attribute, as described in RFC 4145 [RFC4145], MUST be "active" for the offer from the client and MUST be "passive" for the answer from the MRCPv2 server. The a=connection attribute MUST have a value of "new" on the very first control "m=" line offer from the client to an MRCPv2 server. Subsequent control "m=" line offers from the client to the MRCP server MAY contain "new" or "existing", depending on whether the client wants to set up a new connection or share an existing connection, respectively. If the client specifies a value of "new", the server MUST respond with a value of "new". If the client specifies a value of "existing", the server MUST respond. The legal values in the response are "existing" if the server prefers to share an existing connection or "new" if not. In the latter case, the client MUST initiate a new transport connection.
使用TCP作为传输协议的MRCPv2客户机和服务器必须使用RFC 4145[RFC4145]中规定的程序来设置TCP连接,注意事项如下所述。类似地,使用TCP/TLS作为传输协议的MRCPv2客户机和服务器必须使用RFC 4572[RFC4572]中规定的程序来设置TLS连接,注意事项如下所述。RFC 4145[RFC4145]中所述的a=setup属性对于客户端的报价必须是“主动”的,对于MRCPv2服务器的应答必须是“被动”的。从客户端到MRCPv2服务器的第一个控件“m=”行提供上,a=connection属性的值必须为“new”。从客户机到MRCP服务器的后续control“m=”line报价可能包含“新建”或“现有”,具体取决于客户机是希望建立新连接还是共享现有连接。如果客户端指定一个值“new”,服务器必须用一个值“new”进行响应。如果客户机指定的值为“existing”,则服务器必须响应。如果服务器希望共享现有连接,则响应中的合法值为“现有”,否则为“新”。在后一种情况下,客户端必须启动新的传输连接。
When the client wants to deallocate the resource from this session, it issues a new SDP offer, according to RFC 3264 [RFC3264], where the control "m=" line port MUST be set to 0. This SDP offer is sent in a SIP re-INVITE request. This deallocates the associated MRCPv2 identifier and resource. The server MUST NOT close the TCP or TLS connection if it is currently being shared among multiple MRCP channels. When all MRCP channels that may be sharing the connection are released and/or the associated SIP dialog is terminated, the client or server terminates the connection.
根据RFC 3264[RFC3264],当客户机希望从该会话中解除分配资源时,它会发出一个新的SDP提供,其中control“m=”line port必须设置为0。此SDP提供在SIP重新邀请请求中发送。这将取消分配关联的MRCPv2标识符和资源。如果TCP或TLS连接当前正在多个MRCP通道之间共享,则服务器不得关闭该连接。当释放可能共享连接的所有MRCP通道和/或终止关联的SIP对话框时,客户端或服务器终止连接。
When the client wants to tear down the whole session and all its resources, it MUST issue a SIP BYE request to close the SIP session. This will deallocate all the control channels and resources allocated under the session.
当客户端想要中断整个会话及其所有资源时,它必须发出SIP BYE请求以关闭SIP会话。这将取消分配会话下分配的所有控制通道和资源。
All servers MUST support TLS. Servers MAY use TCP without TLS in controlled environments (e.g., not in the public Internet) where both nodes are inside a protected perimeter, for example, preventing access to the MRCP server from remote nodes outside the controlled perimeter. It is up to the client, through the SDP offer, to choose which transport it wants to use for an MRCPv2 session. Aside from the exceptions given above, when using TCP, the "m=" lines MUST conform to RFC4145 [RFC4145], which describes the usage of SDP for connection-oriented transport. When using TLS, the SDP "m=" line for the control stream MUST conform to Connection-Oriented Media (COMEDIA) over TLS [RFC4572], which specifies the usage of SDP for establishing a secure connection-oriented transport over TLS.
所有服务器都必须支持TLS。服务器可在受控环境(例如,不在公共互联网中)中使用不带TLS的TCP,其中两个节点都位于受保护的周界内,例如,阻止受控周界外的远程节点访问MRCP服务器。客户可以通过SDP服务选择要用于MRCPv2会话的传输。除了上面给出的例外情况外,在使用TCP时,“m=”行必须符合RFC4145[RFC4145],这说明了SDP在面向连接的传输中的使用。使用TLS时,控制流的SDP“m=”行必须符合TLS上的面向连接的媒体(COMEDIA)[RFC4572],其中规定了SDP用于通过TLS建立安全的面向连接的传输。
This first example shows the power of using SIP to route to the appropriate resource. In the example, note the use of a request to a domain's speech server service in the INVITE to mresources@example.com. The SIP routing machinery in the domain locates the actual server, mresources@server.example.com, which gets returned in the 200 OK. Note that "cmid" is defined in Section 4.4.
第一个示例展示了使用SIP路由到适当资源的能力。在该示例中,请注意在邀请到中使用了对域的语音服务器服务的请求mresources@example.com. 域中的SIP路由机制定位实际服务器,mresources@server.example.com,它将在200 OK中返回。注意,“cmid”的定义见第4.4节。
This example exchange adds a resource control channel for a synthesizer. Since a synthesizer also generates an audio stream, this interaction also creates a receive-only Real-Time Protocol (RTP) [RFC3550] media session for the server to send audio to. The SIP dialog with the media source/sink is independent of MRCP and is not shown.
此示例exchange为合成器添加了一个资源控制通道。由于合成器还生成音频流,因此此交互还为服务器创建仅接收实时协议(RTP)[RFC3550]媒体会话以向其发送音频。与媒体源/接收器的SIP对话框独立于MRCP,未显示。
C->S: INVITE sip:mresources@example.com SIP/2.0 Via:SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf1 Max-Forwards:6 To:MediaServer <sip:mresources@example.com> From:sarvi <sip:sarvi@example.com>;tag=1928301774 Call-ID:a84b4c76e66710 CSeq:314161 INVITE Contact:<sip:sarvi@client.example.com> Content-Type:application/sdp Content-Length:...
C->S:邀请sip:mresources@example.comSIP/2.0 Via:SIP/2.0/TCP client.atlanta.example.com:5060;branch=z9hG4bK74bf1最大转发:6到:MediaServer<sip:mresources@example.com>发件人:sarvi<sip:sarvi@example.com>;tag=1928301774呼叫ID:a84b4c76e66710 CSeq:314161邀请联系人:<sip:sarvi@client.example.com>内容类型:应用程序/sdp内容长度:。。。
v=0 o=sarvi 2890844526 2890844526 IN IP4 192.0.2.12 s=- c=IN IP4 192.0.2.12 t=0 0 m=application 9 TCP/MRCPv2 1 a=setup:active
v=0 o=sarvi 2890844526 2890844526 IN IP4 192.0.2.12 s=- c=IN IP4 192.0.2.12 t=0 0 m=application 9 TCP/MRCPv2 1 a=setup:active
a=connection:new a=resource:speechsynth a=cmid:1 m=audio 49170 RTP/AVP 0 a=rtpmap:0 pcmu/8000 a=recvonly a=mid:1
a=connection:new a=resource:speechsynth a=cmid:1 m=audio 49170 RTP/AVP 0 a=rtpmap:0 pcmu/8000 a=recvonly a=mid:1
S->C: SIP/2.0 200 OK Via:SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf1;received=192.0.32.10 To:MediaServer <sip:mresources@example.com>;tag=62784 From:sarvi <sip:sarvi@example.com>;tag=1928301774 Call-ID:a84b4c76e66710 CSeq:314161 INVITE Contact:<sip:mresources@server.example.com> Content-Type:application/sdp Content-Length:...
S->C:SIP/2.0 200 OK Via:SIP/2.0/TCP client.atlanta.example.com:5060;分支=z9hG4bK74bf1;已接收=192.0.32.10至:MediaServer<sip:mresources@example.com>;标签=62784发件人:sarvi<sip:sarvi@example.com>;tag=1928301774呼叫ID:a84b4c76e66710 CSeq:314161邀请联系人:<sip:mresources@server.example.com>内容类型:应用程序/sdp内容长度:。。。
v=0 o=- 2890842808 2890842808 IN IP4 192.0.2.11 s=- c=IN IP4 192.0.2.11 t=0 0 m=application 32416 TCP/MRCPv2 1 a=setup:passive a=connection:new a=channel:32AECB234338@speechsynth a=cmid:1 m=audio 48260 RTP/AVP 0 a=rtpmap:0 pcmu/8000 a=sendonly a=mid:1
v=0 o=- 2890842808 2890842808 IN IP4 192.0.2.11 s=- c=IN IP4 192.0.2.11 t=0 0 m=application 32416 TCP/MRCPv2 1 a=setup:passive a=connection:new a=channel:32AECB234338@speechsynth a=cmid:1 m=audio 48260 RTP/AVP 0 a=rtpmap:0 pcmu/8000 a=sendonly a=mid:1
C->S: ACK sip:mresources@server.example.com SIP/2.0 Via:SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf2 Max-Forwards:6 To:MediaServer <sip:mresources@example.com>;tag=62784 From:Sarvi <sip:sarvi@example.com>;tag=1928301774 Call-ID:a84b4c76e66710 CSeq:314161 ACK Content-Length:0
C->S: ACK sip:mresources@server.example.com SIP/2.0 Via:SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf2 Max-Forwards:6 To:MediaServer <sip:mresources@example.com>;tag=62784 From:Sarvi <sip:sarvi@example.com>;tag=1928301774 Call-ID:a84b4c76e66710 CSeq:314161 ACK Content-Length:0
Example: Add Synthesizer Control Channel
示例:添加合成器控制通道
This example exchange continues from the previous figure and allocates an additional resource control channel for a recognizer. Since a recognizer would need to receive an audio stream for recognition, this interaction also updates the audio stream to sendrecv, making it a two-way RTP media session.
此示例交换从上图继续,并为识别器分配额外的资源控制通道。由于识别器需要接收用于识别的音频流,此交互还将音频流更新到sendrecv,使其成为双向RTP媒体会话。
C->S: INVITE sip:mresources@server.example.com SIP/2.0 Via:SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf3 Max-Forwards:6 To:MediaServer <sip:mresources@example.com>;tag=62784 From:sarvi <sip:sarvi@example.com>;tag=1928301774 Call-ID:a84b4c76e66710 CSeq:314162 INVITE Contact:<sip:sarvi@client.example.com> Content-Type:application/sdp Content-Length:...
C->S:邀请sip:mresources@server.example.comSIP/2.0 Via:SIP/2.0/TCP client.atlanta.example.com:5060;branch=z9hG4bK74bf3最大转发:6到:MediaServer<sip:mresources@example.com>;标签=62784发件人:sarvi<sip:sarvi@example.com>;tag=1928301774呼叫ID:a84b4c76e66710 CSeq:314162邀请联系人:<sip:sarvi@client.example.com>内容类型:应用程序/sdp内容长度:。。。
v=0 o=sarvi 2890844526 2890844527 IN IP4 192.0.2.12 s=- c=IN IP4 192.0.2.12 t=0 0 m=application 9 TCP/MRCPv2 1 a=setup:active a=connection:existing a=resource:speechsynth a=cmid:1 m=audio 49170 RTP/AVP 0 96 a=rtpmap:0 pcmu/8000 a=rtpmap:96 telephone-event/8000 a=fmtp:96 0-15 a=sendrecv a=mid:1 m=application 9 TCP/MRCPv2 1 a=setup:active a=connection:existing a=resource:speechrecog a=cmid:1
v=0 o=sarvi 2890844526 2890844527 IN IP4 192.0.2.12 s=- c=IN IP4 192.0.2.12 t=0 0 m=application 9 TCP/MRCPv2 1 a=setup:active a=connection:existing a=resource:speechsynth a=cmid:1 m=audio 49170 RTP/AVP 0 96 a=rtpmap:0 pcmu/8000 a=rtpmap:96 telephone-event/8000 a=fmtp:96 0-15 a=sendrecv a=mid:1 m=application 9 TCP/MRCPv2 1 a=setup:active a=connection:existing a=resource:speechrecog a=cmid:1
S->C: SIP/2.0 200 OK Via:SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf3;received=192.0.32.10 To:MediaServer <sip:mresources@example.com>;tag=62784 From:sarvi <sip:sarvi@example.com>;tag=1928301774 Call-ID:a84b4c76e66710 CSeq:314162 INVITE
S->C: SIP/2.0 200 OK Via:SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf3;received=192.0.32.10 To:MediaServer <sip:mresources@example.com>;tag=62784 From:sarvi <sip:sarvi@example.com>;tag=1928301774 Call-ID:a84b4c76e66710 CSeq:314162 INVITE
Contact:<sip:mresources@server.example.com> Content-Type:application/sdp Content-Length:...
联系人:<sip:mresources@server.example.com>内容类型:应用程序/sdp内容长度:。。。
v=0 o=- 2890842808 2890842809 IN IP4 192.0.2.11 s=- c=IN IP4 192.0.2.11 t=0 0 m=application 32416 TCP/MRCPv2 1 a=setup:passive a=connection:existing a=channel:32AECB234338@speechsynth a=cmid:1 m=audio 48260 RTP/AVP 0 96 a=rtpmap:0 pcmu/8000 a=rtpmap:96 telephone-event/8000 a=fmtp:96 0-15 a=sendrecv a=mid:1 m=application 32416 TCP/MRCPv2 1 a=setup:passive a=connection:existing a=channel:32AECB234338@speechrecog a=cmid:1
v=0 o=- 2890842808 2890842809 IN IP4 192.0.2.11 s=- c=IN IP4 192.0.2.11 t=0 0 m=application 32416 TCP/MRCPv2 1 a=setup:passive a=connection:existing a=channel:32AECB234338@speechsynth a=cmid:1 m=audio 48260 RTP/AVP 0 96 a=rtpmap:0 pcmu/8000 a=rtpmap:96 telephone-event/8000 a=fmtp:96 0-15 a=sendrecv a=mid:1 m=application 32416 TCP/MRCPv2 1 a=setup:passive a=connection:existing a=channel:32AECB234338@speechrecog a=cmid:1
C->S: ACK sip:mresources@server.example.com SIP/2.0 Via:SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf4 Max-Forwards:6 To:MediaServer <sip:mresources@example.com>;tag=62784 From:Sarvi <sip:sarvi@example.com>;tag=1928301774 Call-ID:a84b4c76e66710 CSeq:314162 ACK Content-Length:0
C->S: ACK sip:mresources@server.example.com SIP/2.0 Via:SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf4 Max-Forwards:6 To:MediaServer <sip:mresources@example.com>;tag=62784 From:Sarvi <sip:sarvi@example.com>;tag=1928301774 Call-ID:a84b4c76e66710 CSeq:314162 ACK Content-Length:0
Example: Add Recognizer
示例:添加识别器
This example exchange continues from the previous figure and deallocates the recognizer channel. Since a recognizer no longer needs to receive an audio stream, this interaction also updates the RTP media session to recvonly.
此示例交换从上图继续,并取消分配识别器通道。由于识别器不再需要接收音频流,此交互还将RTP媒体会话更新为RecvoOnly。
C->S: INVITE sip:mresources@server.example.com SIP/2.0 Via:SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf5 Max-Forwards:6
C->S: INVITE sip:mresources@server.example.com SIP/2.0 Via:SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf5 Max-Forwards:6
To:MediaServer <sip:mresources@example.com>;tag=62784 From:sarvi <sip:sarvi@example.com>;tag=1928301774 Call-ID:a84b4c76e66710 CSeq:314163 INVITE Contact:<sip:sarvi@client.example.com> Content-Type:application/sdp Content-Length:...
至:MediaServer<sip:mresources@example.com>;标签=62784发件人:sarvi<sip:sarvi@example.com>;tag=1928301774呼叫ID:a84b4c76e66710 CSeq:314163邀请联系人:<sip:sarvi@client.example.com>内容类型:应用程序/sdp内容长度:。。。
v=0 o=sarvi 2890844526 2890844528 IN IP4 192.0.2.12 s=- c=IN IP4 192.0.2.12 t=0 0 m=application 9 TCP/MRCPv2 1 a=resource:speechsynth a=cmid:1 m=audio 49170 RTP/AVP 0 a=rtpmap:0 pcmu/8000 a=recvonly a=mid:1 m=application 0 TCP/MRCPv2 1 a=resource:speechrecog a=cmid:1
v=0 o=sarvi 2890844526 2890844528 IN IP4 192.0.2.12 s=- c=IN IP4 192.0.2.12 t=0 0 m=application 9 TCP/MRCPv2 1 a=resource:speechsynth a=cmid:1 m=audio 49170 RTP/AVP 0 a=rtpmap:0 pcmu/8000 a=recvonly a=mid:1 m=application 0 TCP/MRCPv2 1 a=resource:speechrecog a=cmid:1
S->C: SIP/2.0 200 OK Via:SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf5;received=192.0.32.10 To:MediaServer <sip:mresources@example.com>;tag=62784 From:sarvi <sip:sarvi@example.com>;tag=1928301774 Call-ID:a84b4c76e66710 CSeq:314163 INVITE Contact:<sip:mresources@server.example.com> Content-Type:application/sdp Content-Length:...
S->C:SIP/2.0 200 OK Via:SIP/2.0/TCP client.atlanta.example.com:5060;分支=z9hG4bK74bf5;已接收=192.0.32.10至:MediaServer<sip:mresources@example.com>;标签=62784发件人:sarvi<sip:sarvi@example.com>;tag=1928301774呼叫ID:a84b4c76e66710 CSeq:314163邀请联系人:<sip:mresources@server.example.com>内容类型:应用程序/sdp内容长度:。。。
v=0 o=- 2890842808 2890842810 IN IP4 192.0.2.11 s=- c=IN IP4 192.0.2.11 t=0 0 m=application 32416 TCP/MRCPv2 1 a=channel:32AECB234338@speechsynth a=cmid:1 m=audio 48260 RTP/AVP 0 a=rtpmap:0 pcmu/8000 a=sendonly a=mid:1
v=0 o=- 2890842808 2890842810 IN IP4 192.0.2.11 s=- c=IN IP4 192.0.2.11 t=0 0 m=application 32416 TCP/MRCPv2 1 a=channel:32AECB234338@speechsynth a=cmid:1 m=audio 48260 RTP/AVP 0 a=rtpmap:0 pcmu/8000 a=sendonly a=mid:1
m=application 0 TCP/MRCPv2 1 a=channel:32AECB234338@speechrecog a=cmid:1
m=application 0 TCP/MRCPv2 1 a=channel:32AECB234338@speechrecog a=cmid:1
C->S: ACK sip:mresources@server.example.com SIP/2.0 Via:SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf6 Max-Forwards:6 To:MediaServer <sip:mresources@example.com>;tag=62784 From:Sarvi <sip:sarvi@example.com>;tag=1928301774 Call-ID:a84b4c76e66710 CSeq:314163 ACK Content-Length:0
C->S: ACK sip:mresources@server.example.com SIP/2.0 Via:SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf6 Max-Forwards:6 To:MediaServer <sip:mresources@example.com>;tag=62784 From:Sarvi <sip:sarvi@example.com>;tag=1928301774 Call-ID:a84b4c76e66710 CSeq:314163 ACK Content-Length:0
Example: Deallocate Recognizer
示例:取消分配识别器
Since MRCPv2 resources either generate or consume media streams, the client or the server needs to associate media sessions with their corresponding resource or resources. More than one resource could be associated with a single media session or each resource could be assigned a separate media session. Also, note that more than one media session can be associated with a single resource if need be, but this scenario is not useful for the current set of resources. For example, a synthesizer and a recognizer could be associated to the same media session (m=audio line), if it is opened in "sendrecv" mode. Alternatively, the recognizer could have its own "sendonly" audio session, and the synthesizer could have its own "recvonly" audio session.
由于MRCPv2资源生成或使用媒体流,因此客户端或服务器需要将媒体会话与其相应的一个或多个资源相关联。可以将多个资源与单个媒体会话关联,也可以为每个资源分配单独的媒体会话。另外,请注意,如果需要,可以将多个媒体会话与单个资源关联,但此场景对当前资源集不有用。例如,如果合成器和识别器在“sendrecv”模式下打开,则可以将其关联到同一媒体会话(m=音频线)。或者,识别器可以有自己的“仅发送”音频会话,合成器可以有自己的“仅接收”音频会话。
The association between control channels and their corresponding media sessions is established using a new "resource channel media identifier" media-level attribute ("cmid"). Valid values of this attribute are the values of the "mid" attribute defined in RFC 5888 [RFC5888]. If there is more than one audio "m=" line, then each audio "m=" line MUST have a "mid" attribute. Each control "m=" line MAY have one or more "cmid" attributes that match the resource control channel to the "mid" attributes of the audio "m=" lines it is associated with. Note that if a control "m=" line does not have a "cmid" attribute it will not be associated with any media. The operations on such a resource will hence be limited. For example, if it was a recognizer resource, the RECOGNIZE method requires an associated media to process while the INTERPRET method does not. The formatting of the "cmid" attribute is described by the following ABNF:
使用新的“资源通道媒体标识符”媒体级属性(“cmid”)建立控制通道及其相应媒体会话之间的关联。此属性的有效值是RFC 5888[RFC5888]中定义的“mid”属性的值。如果有多个audio“m=”行,则每个audio“m=”行必须具有“mid”属性。每个控制“m=”行可能有一个或多个“cmid”属性,这些属性将资源控制频道与其关联的音频“m=”行的“mid”属性相匹配。请注意,如果控件“m=”行没有“cmid”属性,它将不会与任何媒体关联。因此,对此类资源的操作将受到限制。例如,如果它是识别器资源,则识别方法需要关联的媒体来处理,而解释方法则不需要。“cmid”属性的格式由以下ABNF描述:
cmid-attribute = "a=cmid:" identification-tag identification-tag = token
cmid-attribute = "a=cmid:" identification-tag identification-tag = token
To allow this flexible mapping of media sessions to MRCPv2 control channels, a single audio "m=" line can be associated with multiple resources, or each resource can have its own audio "m=" line. For example, if the client wants to allocate a recognizer and a synthesizer and associate them with a single two-way audio stream, the SDP offer would contain two control "m=" lines and a single audio "m=" line with an attribute of "sendrecv". Each of the control "m=" lines would have a "cmid" attribute whose value matches the "mid" of the audio "m=" line. If, on the other hand, the client wants to allocate a recognizer and a synthesizer each with its own separate audio stream, the SDP offer would carry two control "m=" lines (one for the recognizer and another for the synthesizer) and two audio "m=" lines (one with the attribute "sendonly" and another with attribute "recvonly"). The "cmid" attribute of the recognizer control "m=" line would match the "mid" value of the "sendonly" audio "m=" line, and the "cmid" attribute of the synthesizer control "m=" line would match the "mid" attribute of the "recvonly" "m=" line.
为了实现媒体会话到MRCPv2控制通道的灵活映射,单个音频“m=”行可以与多个资源关联,或者每个资源可以有自己的音频“m=”行。例如,如果客户端希望分配识别器和合成器,并将它们与单个双向音频流关联,则SDP产品将包含两个control“m=”lines和一个属性为“sendrecv”的单个audio“m=”line。每个控件“m=”行都有一个“cmid”属性,其值与音频“m=”行的“mid”匹配。另一方面,如果客户机希望分配一个识别器和一个合成器,每个识别器和合成器都有各自独立的音频流,则SDP产品将携带两条control“m=”行(一条用于识别器,另一条用于合成器)和两条audio“m=”行(一条具有属性“sendonly”,另一条具有属性“RecvoOnly”)。识别器控件“m=”行的“cmid”属性将匹配“sendonly”audio“m=”行的“mid”值,合成器控件“m=”行的“cmid”属性将匹配“recvoOnly”m=”行的“mid”属性。
When a server receives media (e.g., audio) on a media session that is associated with more than one media processing resource, it is the responsibility of the server to receive and fork the media to the resources that need to consume it. If multiple resources in an MRCPv2 session are generating audio (or other media) to be sent on a single associated media session, it is the responsibility of the server either to multiplex the multiple streams onto the single RTP session or to contain an embedded RTP mixer (see RFC 3550 [RFC3550]) to combine the multiple streams into one. In the former case, the media stream will contain RTP packets generated by different sources, and hence the packets will have different Synchronization Source Identifiers (SSRCs). In the latter case, the RTP packets will contain multiple Contributing Source Identifiers (CSRCs) corresponding to the original streams before being combined by the mixer. If an MRCPv2 server implementation neither multiplexes nor mixes, it MUST disallow the client from associating multiple such resources to a single audio stream by rejecting the SDP offer with a SIP 488 "Not Acceptable" error. Note that there is a large installed base that will return a SIP 501 "Not Implemented" error in this case. To facilitate interoperability with this installed base, new implementations SHOULD treat a 501 in this context as a 488 when it is received from an element known to be a legacy implementation.
当服务器在与多个媒体处理资源相关联的媒体会话上接收媒体(例如音频)时,服务器负责接收媒体并将其转发给需要使用它的资源。如果MRCPv2会话中的多个资源正在生成要在单个相关媒体会话上发送的音频(或其他媒体),则服务器负责将多个流多路复用到单个RTP会话上,或包含嵌入式RTP混音器(请参阅RFC 3550[RFC3550]),以将多个流合并成一个流。在前一种情况下,媒体流将包含由不同源生成的RTP分组,因此分组将具有不同的同步源标识符(ssrc)。在后一种情况下,RTP分组在被混频器组合之前将包含与原始流相对应的多个贡献源标识符(csrc)。如果MRCPv2服务器实现既不多路复用也不混合,则必须通过拒绝带有SIP 488“不可接受”错误的SDP提供,禁止客户端将多个此类资源关联到单个音频流。请注意,在这种情况下,有大量的安装用户将返回SIP 501“未实现”错误。为了促进与该已安装基础的互操作性,当从已知为遗留实现的元素接收501时,新实现应将其视为488。
The MRCPv2 messages defined in this document are transported over a TCP or TLS connection between the client and the server. The method for setting up this transport connection and the resource control channel is discussed in Sections 4.1 and 4.2. Multiple resource control channels between a client and a server that belong to different SIP dialogs can share one or more TLS or TCP connections between them; the server and client MUST support this mode of operation. Clients and servers MUST use the MRCPv2 channel identifier, carried in the Channel-Identifier header field in individual MRCPv2 messages, to differentiate MRCPv2 messages from different resource channels (see Section 6.2.1 for details). All MRCPv2 servers MUST support TLS. Servers MAY use TCP without TLS in controlled environments (e.g., not in the public Internet) where both nodes are inside a protected perimeter, for example, preventing access to the MRCP server from remote nodes outside the controlled perimeter. It is up to the client to choose which mode of transport it wants to use for an MRCPv2 session.
本文档中定义的MRCPv2消息通过客户端和服务器之间的TCP或TLS连接进行传输。第4.1节和第4.2节讨论了设置此传输连接和资源控制通道的方法。属于不同SIP对话框的客户端和服务器之间的多个资源控制通道可以在它们之间共享一个或多个TLS或TCP连接;服务器和客户端必须支持此操作模式。客户机和服务器必须使用单个MRCPv2消息的信道标识符标题字段中携带的MRCPv2信道标识符来区分来自不同资源信道的MRCPv2消息(有关详细信息,请参阅第6.2.1节)。所有MRCPv2服务器必须支持TLS。服务器可在受控环境(例如,不在公共互联网中)中使用不带TLS的TCP,其中两个节点都位于受保护的周界内,例如,阻止受控周界外的远程节点访问MRCP服务器。由客户机选择MRCPv2会话要使用的传输模式。
Most examples from here on show only the MRCPv2 messages and do not show the SIP messages that may have been used to establish the MRCPv2 control channel.
这里的大多数示例仅显示MRCPv2消息,而不显示可能已用于建立MRCPv2控制通道的SIP消息。
If an MRCP client notices that the underlying connection has been closed for one of its MRCP channels, and it has not previously initiated a re-INVITE to close that channel, it MUST send a BYE to close down the SIP dialog and all other MRCP channels. If an MRCP server notices that the underlying connection has been closed for one of its MRCP channels, and it has not previously received and accepted a re-INVITE closing that channel, then it MUST send a BYE to close down the SIP dialog and all other MRCP channels.
如果MRCP客户端注意到其其中一个MRCP通道的基础连接已关闭,并且之前未发起重新邀请以关闭该通道,则必须发送BYE以关闭SIP对话框和所有其他MRCP通道。如果MRCP服务器注意到其一个MRCP通道的基础连接已关闭,并且之前未收到并接受关闭该通道的重新邀请,则必须发送BYE以关闭SIP对话框和所有其他MRCP通道。
Except as otherwise indicated, MRCPv2 messages are Unicode encoded in UTF-8 (RFC 3629 [RFC3629]) to allow many different languages to be represented. DEFINE-GRAMMAR (Section 9.8), for example, is one such exception, since its body can contain arbitrary XML in arbitrary (but specified via XML) encodings. MRCPv2 also allows message bodies to be represented in other character sets (for example, ISO 8859-1 [ISO.8859-1.1987]) because, in some locales, other character sets are already in widespread use. The MRCPv2 headers (the first line of an MRCP message) and header field names use only the US-ASCII subset of UTF-8.
除非另有说明,MRCPv2消息以UTF-8(RFC 3629[RFC3629])进行Unicode编码,以允许表示多种不同的语言。例如,DEFINE-GRAMMAR(第9.8节)就是这样一个例外,因为它的主体可以在任意(但通过XML指定)编码中包含任意XML。MRCPv2还允许在其他字符集(例如,ISO 8859-1[ISO.8859-1.1987])中表示消息正文,因为在某些地区,其他字符集已经广泛使用。MRCPv2标头(MRCP消息的第一行)和标头字段名仅使用UTF-8的US-ASCII子集。
Lines are terminated by CRLF (carriage return, then line feed). Also, some parameters in the message may contain binary data or a record spanning multiple lines. Such fields have a length value associated with the parameter, which indicates the number of octets immediately following the parameter.
行由CRLF终止(回车,然后换行)。此外,消息中的某些参数可能包含二进制数据或跨多行的记录。此类字段具有与参数关联的长度值,该值指示紧跟在参数之后的八位字节数。
The MRCPv2 message set consists of requests from the client to the server, responses from the server to the client, and asynchronous events from the server to the client. All these messages consist of a start-line, one or more header fields, an empty line (i.e., a line with nothing preceding the CRLF) indicating the end of the header fields, and an optional message body.
MRCPv2消息集包括从客户端到服务器的请求、从服务器到客户端的响应以及从服务器到客户端的异步事件。所有这些消息都包括一个起始行、一个或多个标题字段、一个空行(即CRLF前面没有任何内容的行)和一个可选的消息正文。
generic-message = start-line message-header CRLF [ message-body ]
通用消息=起始行消息头CRLF[消息正文]
message-body = *OCTET
message-body = *OCTET
start-line = request-line / response-line / event-line
start-line = request-line / response-line / event-line
message-header = 1*(generic-header / resource-header / generic-field)
message-header = 1*(generic-header / resource-header / generic-field)
resource-header = synthesizer-header / recognizer-header / recorder-header / verifier-header
resource-header = synthesizer-header / recognizer-header / recorder-header / verifier-header
The message-body contains resource-specific and message-specific data. The actual media types used to carry the data are specified in the sections defining the individual messages. Generic header fields are described in Section 6.2.
消息正文包含特定于资源和特定于消息的数据。用于传输数据的实际媒体类型在定义单个消息的部分中指定。第6.2节描述了通用标题字段。
If a message contains a message body, the message MUST contain content-headers indicating the media type and encoding of the data in the message body.
如果消息包含消息正文,则消息必须包含指示消息正文中数据的媒体类型和编码的内容头。
Request, response and event messages (described in following sections) include the version of MRCP that the message conforms to. Version compatibility rules follow [H3.1] regarding version ordering, compliance requirements, and upgrading of version numbers. The version information is indicated by "MRCP" (as opposed to "HTTP" in [H3.1]) or "MRCP/2.0" (as opposed to "HTTP/1.1" in [H3.1]). To be compliant with this specification, clients and servers sending MRCPv2
请求、响应和事件消息(在以下章节中描述)包括消息所遵循的MRCP版本。版本兼容性规则遵循[H3.1],涉及版本订购、合规性要求和版本号升级。版本信息由“MRCP”(与[H3.1]中的“HTTP”相对)或“MRCP/2.0”(与[H3.1]中的“HTTP/1.1”相对)表示。为了符合本规范,客户端和服务器发送MRCPv2
messages MUST indicate an mrcp-version of "MRCP/2.0". ABNF productions using mrcp-version can be found in Sections 5.2, 5.3, and 5.5.
消息必须指明“mrcp/2.0”的mrcp版本。使用mrcp版本的ABNF产品可在第5.2、5.3和5.5节中找到。
mrcp-version = "MRCP" "/" 1*2DIGIT "." 1*2DIGIT
mrcp-version = "MRCP" "/" 1*2DIGIT "." 1*2DIGIT
The message-length field specifies the length of the message in octets, including the start-line, and MUST be the second token from the beginning of the message. This is to make the framing and parsing of the message simpler to do. This field specifies the length of the message including data that may be encoded into the body of the message. Note that this value MAY be given as a fixed-length integer that is zero-padded (with leading zeros) in order to eliminate or reduce inefficiency in cases where the message-length value would change as a result of the length of the message-length token itself. This value, as with all lengths in MRCP, is to be interpreted as a base-10 number. In particular, leading zeros do not indicate that the value is to be interpreted as a base-8 number.
message length字段以八位字节(包括起始行)指定消息的长度,并且必须是消息开头的第二个令牌。这是为了简化消息的框架和解析。此字段指定消息的长度,包括可编码到消息正文中的数据。注意,该值可以作为一个固定长度的整数给出,该整数是零填充的(带前导零),以便在消息长度值将由于消息长度令牌本身的长度而改变的情况下消除或降低效率低下。与MRCP中的所有长度一样,该值应解释为基数为10的数字。特别是,前导零并不表示该值将被解释为基数为8的数字。
message-length = 1*19DIGIT
message-length = 1*19DIGIT
The following sample MRCP exchange demonstrates proper message-length values. The values for message-length have been removed from all other examples in the specification and replaced by '...' to reduce confusion in the case of minor message-length computation errors in those examples.
下面的MRCP交换示例演示了正确的消息长度值。已从规范中的所有其他示例中删除消息长度值,并用“…”替换,以减少在这些示例中出现较小消息长度计算错误时的混淆。
C->S: MRCP/2.0 877 INTERPRET 543266 Channel-Identifier:32AECB23433801@speechrecog Interpret-Text:may I speak to Andre Roy Content-Type:application/srgs+xml Content-ID:<request1@form-level.store> Content-Length:661
C->S: MRCP/2.0 877 INTERPRET 543266 Channel-Identifier:32AECB23433801@speechrecog Interpret-Text:may I speak to Andre Roy Content-Type:application/srgs+xml Content-ID:<request1@form-level.store> Content-Length:661
<?xml version="1.0"?> <!-- the default grammar language is US English --> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" root="request"> <!-- single language attachment to tokens --> <rule id="yes"> <one-of> <item xml:lang="fr-CA">oui</item> <item xml:lang="en-US">yes</item> </one-of> </rule>
<?xml version="1.0"?> <!-- the default grammar language is US English --> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" root="request"> <!-- single language attachment to tokens --> <rule id="yes"> <one-of> <item xml:lang="fr-CA">oui</item> <item xml:lang="en-US">yes</item> </one-of> </rule>
<!-- single language attachment to a rule expansion --> <rule id="request"> may I speak to <one-of xml:lang="fr-CA"> <item>Michel Tremblay</item> <item>Andre Roy</item> </one-of> </rule> </grammar>
<!-- single language attachment to a rule expansion --> <rule id="request"> may I speak to <one-of xml:lang="fr-CA"> <item>Michel Tremblay</item> <item>Andre Roy</item> </one-of> </rule> </grammar>
S->C: MRCP/2.0 82 543266 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speechrecog
S->C: MRCP/2.0 82 543266 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speechrecog
S->C: MRCP/2.0 634 INTERPRETATION-COMPLETE 543266 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog Completion-Cause:000 success Content-Type:application/nlsml+xml Content-Length:441
S->C: MRCP/2.0 634 INTERPRETATION-COMPLETE 543266 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog Completion-Cause:000 success Content-Type:application/nlsml+xml Content-Length:441
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:ex="http://www.example.com/example" grammar="session:request1@form-level.store"> <interpretation> <instance name="Person"> <ex:Person> <ex:Name> Andre Roy </ex:Name> </ex:Person> </instance> <input> may I speak to Andre Roy </input> </interpretation> </result>
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:ex="http://www.example.com/example" grammar="session:request1@form-level.store"> <interpretation> <instance name="Person"> <ex:Person> <ex:Name> Andre Roy </ex:Name> </ex:Person> </instance> <input> may I speak to Andre Roy </input> </interpretation> </result>
All MRCPv2 messages, responses and events MUST carry the Channel-Identifier header field so the server or client can differentiate messages from different control channels that may share the same transport connection.
所有MRCPv2消息、响应和事件必须带有通道标识符标头字段,以便服务器或客户端能够区分来自可能共享同一传输连接的不同控制通道的消息。
In the resource-specific header field descriptions in Sections 8-11, a header field is disallowed on a method (request, response, or event) for that resource unless specifically listed as being allowed. Also, the phrasing "This header field MAY occur on method X" indicates that the header field is allowed on that method but is not required to be used in every instance of that method.
在第8-11节中的资源特定头字段描述中,不允许在该资源的方法(请求、响应或事件)上使用头字段,除非明确列出为允许。此外,短语“此标头字段可能出现在方法X上”表示允许在该方法上使用标头字段,但不要求在该方法的每个实例中使用。
An MRCPv2 request consists of a Request line followed by the message header section and an optional message body containing data specific to the request message.
MRCPv2请求包括一个请求行,后跟消息头部分和一个可选的消息体,其中包含特定于请求消息的数据。
The Request message from a client to the server includes within the first line the method to be applied, a method tag for that request and the version of the protocol in use.
从客户端到服务器的请求消息在第一行中包括要应用的方法、该请求的方法标记和正在使用的协议版本。
request-line = mrcp-version SP message-length SP method-name SP request-id CRLF
请求行=mrcp版本SP消息长度SP方法名称SP请求id CRLF
The mrcp-version field is the MRCP protocol version that is being used by the client.
mrcp版本字段是客户端正在使用的mrcp协议版本。
The message-length field specifies the length of the message, including the start-line.
消息长度字段指定消息的长度,包括起始行。
Details about the mrcp-version and message-length fields are given in Section 5.1.
有关mrcp版本和消息长度字段的详细信息,请参见第5.1节。
The method-name field identifies the specific request that the client is making to the server. Each resource supports a subset of the MRCPv2 methods. The subset for each resource is defined in the section of the specification for the corresponding resource.
“方法名称”字段标识客户端向服务器发出的特定请求。每个资源都支持MRCPv2方法的子集。每个资源的子集在相应资源的规范部分中定义。
method-name = generic-method / synthesizer-method / recognizer-method / recorder-method / verifier-method
method-name = generic-method / synthesizer-method / recognizer-method / recorder-method / verifier-method
The request-id field is a unique identifier representable as an unsigned 32-bit integer created by the client and sent to the server. Clients MUST utilize monotonically increasing request-ids for consecutive requests within an MRCP session. The request-id space is linear (i.e., not mod(32)), so the space does not wrap, and validity can be checked with a simple unsigned comparison operation. The client may choose any initial value for its first request, but a small integer is RECOMMENDED to avoid exhausting the space in long sessions. If the server receives duplicate or out-of-order requests, the server MUST reject the request with a response code of 410. Since request-ids are scoped to the MRCP session, they are unique across all TCP connections and all resource channels in the session.
请求id字段是一个唯一标识符,表示为客户端创建并发送到服务器的无符号32位整数。对于MRCP会话中的连续请求,客户端必须使用单调递增的请求ID。请求id空间是线性的(即,不是mod(32)),因此该空间不会换行,可以通过简单的无符号比较操作检查有效性。客户端可以为其第一个请求选择任何初始值,但建议使用小整数,以避免在长会话中耗尽空间。如果服务器收到重复或无序的请求,则服务器必须拒绝响应代码为410的请求。由于请求ID的作用域是MRCP会话,因此它们在会话中的所有TCP连接和所有资源通道中都是唯一的。
The server resource MUST use the client-assigned identifier in its response to the request. If the request does not complete
服务器资源在响应请求时必须使用客户端分配的标识符。如果请求未完成
synchronously, future asynchronous events associated with this request MUST carry the client-assigned request-id.
同步地,与此请求关联的未来异步事件必须携带客户端分配的request-id。
request-id = 1*10DIGIT
request-id = 1*10DIGIT
After receiving and interpreting the request message for a method, the server resource responds with an MRCPv2 response message. The response consists of a response line followed by the message header section and an optional message body containing data specific to the method.
在接收并解释方法的请求消息之后,服务器资源将使用MRCPv2响应消息进行响应。响应包括一个响应行,后跟消息头部分和一个可选的消息体,其中包含特定于该方法的数据。
response-line = mrcp-version SP message-length SP request-id SP status-code SP request-state CRLF
响应行=mrcp版本SP消息长度SP请求id SP状态代码SP请求状态CRLF
The mrcp-version field MUST contain the version of the request if supported; otherwise, it MUST contain the highest version of MRCP supported by the server.
mrcp版本字段必须包含请求的版本(如果支持);否则,它必须包含服务器支持的最高版本的MRCP。
The message-length field specifies the length of the message, including the start-line.
消息长度字段指定消息的长度,包括起始行。
Details about the mrcp-version and message-length fields are given in Section 5.1.
有关mrcp版本和消息长度字段的详细信息,请参见第5.1节。
The request-id used in the response MUST match the one sent in the corresponding request message.
响应中使用的请求id必须与相应请求消息中发送的id匹配。
The status-code field is a 3-digit code representing the success or failure or other status of the request.
状态代码字段是一个3位代码,表示请求的成功或失败或其他状态。
status-code = 3DIGIT
status-code = 3DIGIT
The request-state field indicates if the action initiated by the Request is PENDING, IN-PROGRESS, or COMPLETE. The COMPLETE status means that the request was processed to completion and that there will be no more events or other messages from that resource to the client with that request-id. The PENDING status means that the request has been placed in a queue and will be processed in first-in-first-out order. The IN-PROGRESS status means that the request is being processed and is not yet complete. A PENDING or IN-PROGRESS status indicates that further Event messages may be delivered with that request-id.
请求状态字段指示请求发起的操作是挂起、正在进行还是已完成。“完成”状态表示请求已被处理到完成状态,并且将不会有更多事件或其他消息从该资源发送到具有该请求id的客户端。“挂起”状态表示请求已放入队列中,并将以先进先出的顺序处理。“正在进行”状态表示请求正在处理中,尚未完成。挂起或进行中状态表示可能会使用该请求id传递更多事件消息。
request-state = "COMPLETE" / "IN-PROGRESS" / "PENDING"
请求状态=“完成”/“正在进行”/“待定”
The status codes are classified under the Success (2xx), Client Failure (4xx), and Server Failure (5xx) codes.
状态代码分为成功(2xx)、客户端故障(4xx)和服务器故障(5xx)代码。
+------------+--------------------------------------------------+ | Code | Meaning | +------------+--------------------------------------------------+ | 200 | Success | | 201 | Success with some optional header fields ignored | +------------+--------------------------------------------------+
+------------+--------------------------------------------------+ | Code | Meaning | +------------+--------------------------------------------------+ | 200 | Success | | 201 | Success with some optional header fields ignored | +------------+--------------------------------------------------+
Success (2xx)
成功(2xx)
+--------+----------------------------------------------------------+ | Code | Meaning | +--------+----------------------------------------------------------+ | 401 | Method not allowed | | 402 | Method not valid in this state | | 403 | Unsupported header field | | 404 | Illegal value for header field. This is the error for a | | | syntax violation. | | 405 | Resource not allocated for this session or does not | | | exist | | 406 | Mandatory Header Field Missing | | 407 | Method or Operation Failed (e.g., Grammar compilation | | | failed in the recognizer. Detailed cause codes might be | | | available through a resource-specific header.) | | 408 | Unrecognized or unsupported message entity | | 409 | Unsupported Header Field Value. This is a value that is | | | syntactically legal but exceeds the implementation's | | | capabilities or expectations. | | 410 | Non-Monotonic or Out-of-order sequence number in request.| | 411-420| Reserved for future assignment | +--------+----------------------------------------------------------+
+--------+----------------------------------------------------------+ | Code | Meaning | +--------+----------------------------------------------------------+ | 401 | Method not allowed | | 402 | Method not valid in this state | | 403 | Unsupported header field | | 404 | Illegal value for header field. This is the error for a | | | syntax violation. | | 405 | Resource not allocated for this session or does not | | | exist | | 406 | Mandatory Header Field Missing | | 407 | Method or Operation Failed (e.g., Grammar compilation | | | failed in the recognizer. Detailed cause codes might be | | | available through a resource-specific header.) | | 408 | Unrecognized or unsupported message entity | | 409 | Unsupported Header Field Value. This is a value that is | | | syntactically legal but exceeds the implementation's | | | capabilities or expectations. | | 410 | Non-Monotonic or Out-of-order sequence number in request.| | 411-420| Reserved for future assignment | +--------+----------------------------------------------------------+
Client Failure (4xx)
客户端故障(4xx)
+------------+--------------------------------+ | Code | Meaning | +------------+--------------------------------+ | 501 | Server Internal Error | | 502 | Protocol Version not supported | | 503 | Reserved for future assignment | | 504 | Message too large | +------------+--------------------------------+
+------------+--------------------------------+ | Code | Meaning | +------------+--------------------------------+ | 501 | Server Internal Error | | 502 | Protocol Version not supported | | 503 | Reserved for future assignment | | 504 | Message too large | +------------+--------------------------------+
Server Failure (5xx)
服务器故障(5xx)
The server resource may need to communicate a change in state or the occurrence of a certain event to the client. These messages are used when a request does not complete immediately and the response returns a status of PENDING or IN-PROGRESS. The intermediate results and events of the request are indicated to the client through the event message from the server. The event message consists of an event header line followed by the message header section and an optional message body containing data specific to the event message. The header line has the request-id of the corresponding request and status value. The request-state value is COMPLETE if the request is done and this was the last event, else it is IN-PROGRESS.
服务器资源可能需要将状态的变化或某个事件的发生告知客户端。当请求未立即完成且响应返回挂起或进行中状态时,将使用这些消息。请求的中间结果和事件通过来自服务器的事件消息指示给客户端。事件消息由事件标题行、消息标题部分和包含特定于事件消息的数据的可选消息正文组成。标题行具有相应请求的请求id和状态值。如果请求已完成且这是最后一个事件,则请求状态值为COMPLETE,否则它正在进行中。
event-line = mrcp-version SP message-length SP event-name SP request-id SP request-state CRLF
事件行=mrcp版本SP消息长度SP事件名称SP请求id SP请求状态CRLF
The mrcp-version used here is identical to the one used in the Request/Response line and indicates the highest version of MRCP running on the server.
此处使用的mrcp版本与请求/响应行中使用的版本相同,表示服务器上运行的mrcp的最高版本。
The message-length field specifies the length of the message, including the start-line.
消息长度字段指定消息的长度,包括起始行。
Details about the mrcp-version and message-length fields are given in Section 5.1.
有关mrcp版本和消息长度字段的详细信息,请参见第5.1节。
The event-name identifies the nature of the event generated by the media resource. The set of valid event names depends on the resource generating it. See the corresponding resource-specific section of the document.
事件名称标识由媒体资源生成的事件的性质。有效事件名称集取决于生成它的资源。请参阅文档中相应的特定于资源的部分。
event-name = synthesizer-event / recognizer-event / recorder-event / verifier-event
event-name = synthesizer-event / recognizer-event / recorder-event / verifier-event
The request-id used in the event MUST match the one sent in the request that caused this event.
事件中使用的请求id必须与导致此事件的请求中发送的id匹配。
The request-state indicates whether the Request/Command causing this event is complete or still in progress and whether it is the same as the one mentioned in Section 5.3. The final event for a request has a COMPLETE status indicating the completion of the request.
请求状态表示导致此事件的请求/命令是否已完成或仍在进行中,以及是否与第5.3节中提到的相同。请求的最终事件具有指示请求完成的完成状态。
MRCPv2 supports a set of methods and header fields that are common to all resources. These are discussed here; resource-specific methods and header fields are discussed in the corresponding resource-specific section of the document.
MRCPv2支持一组对所有资源通用的方法和头字段。这里讨论这些问题;特定于资源的方法和标题字段将在文档的相应特定于资源的部分中讨论。
MRCPv2 supports two generic methods for reading and writing the state associated with a resource.
MRCPv2支持两种通用方法来读取和写入与资源关联的状态。
generic-method = "SET-PARAMS" / "GET-PARAMS"
通用方法=“SET-PARAMS”/“GET-PARAMS”
These are described in the following subsections.
以下各小节将对此进行说明。
The SET-PARAMS method, from the client to the server, tells the MRCPv2 resource to define parameters for the session, such as voice characteristics and prosody on synthesizers, recognition timers on recognizers, etc. If the server accepts and sets all parameters, it MUST return a response status-code of 200. If it chooses to ignore some optional header fields that can be safely ignored without affecting operation of the server, it MUST return 201.
从客户端到服务器的SET-PARAMS方法告诉MRCPv2资源为会话定义参数,例如合成器上的语音特征和韵律、识别器上的识别计时器等。如果服务器接受并设置所有参数,则必须返回200的响应状态码。如果它选择忽略一些可安全忽略而不影响服务器操作的可选标头字段,则必须返回201。
If one or more of the header fields being sent is incorrect, error 403, 404, or 409 MUST be returned as follows:
如果发送的一个或多个报头字段不正确,则必须按如下方式返回错误403、404或409:
o If one or more of the header fields being set has an illegal value, the server MUST reject the request with a 404 Illegal Value for Header Field.
o 如果正在设置的一个或多个标头字段具有非法值,则服务器必须拒绝标头字段具有404非法值的请求。
o If one or more of the header fields being set is unsupported for the resource, the server MUST reject the request with a 403 Unsupported Header Field, except as described in the next paragraph.
o 如果资源不支持正在设置的一个或多个标头字段,则服务器必须使用403不支持的标头字段拒绝请求,下一段中描述的情况除外。
o If one or more of the header fields being set has an unsupported value, the server MUST reject the request with a 409 Unsupported Header Field Value, except as described in the next paragraph.
o 如果正在设置的一个或多个标头字段具有不支持的值,则服务器必须使用409不支持的标头字段值拒绝请求,下一段中描述的情况除外。
If both error 404 and another error have occurred, only error 404 MUST be returned. If both errors 403 and 409 have occurred, but not error 404, only error 403 MUST be returned.
如果发生了错误404和另一个错误,则只能返回错误404。如果发生了错误403和409,但没有发生错误404,则只能返回错误403。
If error 403, 404, or 409 is returned, the response MUST include the bad or unsupported header fields and their values exactly as they were sent from the client. Session parameters modified using SET-PARAMS do not override parameters explicitly specified on individual requests or requests that are IN-PROGRESS.
如果返回错误403、404或409,响应必须包括错误或不支持的头字段及其值,与从客户端发送的值完全相同。使用SET-PARAMS修改的会话参数不会覆盖在单个请求或正在进行的请求上明确指定的参数。
C->S: MRCP/2.0 ... SET-PARAMS 543256 Channel-Identifier:32AECB23433802@speechsynth Voice-gender:female Voice-variant:3
C->S: MRCP/2.0 ... SET-PARAMS 543256 Channel-Identifier:32AECB23433802@speechsynth Voice-gender:female Voice-variant:3
S->C: MRCP/2.0 ... 543256 200 COMPLETE Channel-Identifier:32AECB23433802@speechsynth
S->C: MRCP/2.0 ... 543256 200 COMPLETE Channel-Identifier:32AECB23433802@speechsynth
The GET-PARAMS method, from the client to the server, asks the MRCPv2 resource for its current session parameters, such as voice characteristics and prosody on synthesizers, recognition timers on recognizers, etc. For every header field the client sends in the request without a value, the server MUST include the header field and its corresponding value in the response. If no parameter header fields are specified by the client, then the server MUST return all the settable parameters and their values in the corresponding header section of the response, including vendor-specific parameters. Such wildcard parameter requests can be very processing-intensive, since the number of settable parameters can be large depending on the implementation. Hence, it is RECOMMENDED that the client not use the wildcard GET-PARAMS operation very often. Note that GET-PARAMS returns header field values that apply to the whole session and not values that have a request-level scope. For example, Input-Waveform-URI is a request-level header field and thus would not be returned by GET-PARAMS.
从客户端到服务器的GET-PARAMS方法要求MRCPv2资源提供其当前会话参数,例如合成器上的语音特征和韵律、识别器上的识别计时器等。对于客户端在请求中发送的每个头字段,没有值,服务器必须在响应中包含标头字段及其相应的值。如果客户端未指定任何参数头字段,则服务器必须在响应的相应头部分中返回所有可设置参数及其值,包括特定于供应商的参数。这样的通配符参数请求可能非常需要处理,因为可设置参数的数量可能很大,具体取决于实现。因此,建议客户端不要经常使用通配符GET-PARAMS操作。请注意,GET-PARAMS返回应用于整个会话的头字段值,而不是具有请求级别作用域的值。例如,输入波形URI是请求级别的头字段,因此GET-PARAMS不会返回该字段。
If all of the header fields requested are supported, the server MUST return a response status-code of 200. If some of the header fields being retrieved are unsupported for the resource, the server MUST reject the request with a 403 Unsupported Header Field. Such a response MUST include the unsupported header fields exactly as they were sent from the client, without values.
如果支持请求的所有标头字段,则服务器必须返回200的响应状态代码。如果资源不支持正在检索的某些头字段,则服务器必须使用403不支持的头字段拒绝请求。这样的响应必须包含与从客户端发送的完全相同的不受支持的头字段,而不包含值。
C->S: MRCP/2.0 ... GET-PARAMS 543256 Channel-Identifier:32AECB23433802@speechsynth Voice-gender: Voice-variant: Vendor-Specific-Parameters:com.example.param1; com.example.param2
C->S: MRCP/2.0 ... GET-PARAMS 543256 Channel-Identifier:32AECB23433802@speechsynth Voice-gender: Voice-variant: Vendor-Specific-Parameters:com.example.param1; com.example.param2
S->C: MRCP/2.0 ... 543256 200 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Voice-gender:female Voice-variant:3 Vendor-Specific-Parameters:com.example.param1="Company Name"; com.example.param2="124324234@example.com"
S->C: MRCP/2.0 ... 543256 200 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Voice-gender:female Voice-variant:3 Vendor-Specific-Parameters:com.example.param1="Company Name"; com.example.param2="124324234@example.com"
All MRCPv2 header fields, which include both the generic-headers defined in the following subsections and the resource-specific header fields defined later, follow the same generic format as that given in Section 3.1 of RFC 5322 [RFC5322]. Each header field consists of a name followed by a colon (":") and the value. Header field names are case-insensitive. The value MAY be preceded by any amount of LWS (linear white space), though a single SP (space) is preferred. Header fields may extend over multiple lines by preceding each extra line with at least one SP or HT (horizontal tab).
所有MRCPv2头字段,包括以下小节中定义的通用头和后面定义的资源特定头字段,遵循RFC 5322[RFC5322]第3.1节中给出的相同通用格式。每个标题字段由一个名称、一个冒号(“:”)和一个值组成。标题字段名称不区分大小写。该值前面可以有任意数量的LW(线性空白),但最好是单个SP(空白)。标题字段可以通过在每一额外行之前至少添加一个SP或HT(水平选项卡)扩展到多行。
generic-field = field-name ":" [ field-value ] field-name = token field-value = *LWS field-content *( CRLF 1*LWS field-content) field-content = <the OCTETs making up the field-value and consisting of either *TEXT or combinations of token, separators, and quoted-string>
generic-field = field-name ":" [ field-value ] field-name = token field-value = *LWS field-content *( CRLF 1*LWS field-content) field-content = <the OCTETs making up the field-value and consisting of either *TEXT or combinations of token, separators, and quoted-string>
The field-content does not include any leading or trailing LWS (i.e., linear white space occurring before the first non-whitespace character of the field-value or after the last non-whitespace character of the field-value). Such leading or trailing LWS MAY be removed without changing the semantics of the field value. Any LWS that occurs between field-content MAY be replaced with a single SP before interpreting the field value or forwarding the message downstream.
字段内容不包括任何前导或尾随LW(即,字段值的第一个非空白字符之前或字段值的最后一个非空白字符之后出现的线性空白)。在不改变字段值的语义的情况下,可以删除此类前导或尾随LW。在解释字段值或向下游转发消息之前,字段内容之间发生的任何LW都可以用单个SP替换。
MRCPv2 servers and clients MUST NOT depend on header field order. It is RECOMMENDED to send general-header fields first, followed by request-header or response-header fields, and ending with the entity-header fields. However, MRCPv2 servers and clients MUST be prepared to process the header fields in any order. The only exception to this rule is when there are multiple header fields with the same name in a message.
MRCPv2服务器和客户端不得依赖于标题字段顺序。建议先发送常规标头字段,然后发送请求标头或响应标头字段,最后发送实体标头字段。但是,MRCPv2服务器和客户端必须准备好以任何顺序处理头字段。此规则的唯一例外是当消息中有多个标题字段具有相同名称时。
Multiple header fields with the same name MAY be present in a message if and only if the entire value for that header field is defined as a comma-separated list [i.e., #(values)].
当且仅当消息头字段的整个值定义为逗号分隔列表[即#(值)]时,消息中可能存在多个同名的标题字段。
Since vendor-specific parameters may be order-dependent, it MUST be possible to combine multiple header fields of the same name into one "name:value" pair without changing the semantics of the message, by appending each subsequent value to the first, each separated by a comma. The order in which header fields with the same name are received is therefore significant to the interpretation of the combined header field value, and thus an intermediary MUST NOT change the order of these values when a message is forwarded.
由于特定于供应商的参数可能取决于顺序,因此必须能够将相同名称的多个标题字段组合成一个“name:value”对,而无需更改消息的语义,方法是将每个后续值附加到第一个值,每个值用逗号分隔。因此,具有相同名称的报头字段的接收顺序对于组合报头字段值的解释非常重要,因此,在转发消息时,中间人不得更改这些值的顺序。
generic-header = channel-identifier / accept / active-request-id-list / proxy-sync-id / accept-charset / content-type / content-id / content-base / content-encoding / content-location / content-length / fetch-timeout / cache-control / logging-tag / set-cookie / vendor-specific
generic-header = channel-identifier / accept / active-request-id-list / proxy-sync-id / accept-charset / content-type / content-id / content-base / content-encoding / content-location / content-length / fetch-timeout / cache-control / logging-tag / set-cookie / vendor-specific
All MRCPv2 requests, responses, and events MUST contain the Channel-Identifier header field. The value is allocated by the server when a control channel is added to the session and communicated to the client by the "a=channel" attribute in the SDP answer from the server. The header field value consists of 2 parts separated by the '@' symbol. The first part is an unambiguous string identifying the MRCPv2 session. The second part is a string token that specifies one of the media processing resource types listed in Section 3.1. The unambiguous string (first part) MUST be difficult to guess, unique among the resource instances managed by the server, and common to all resource channels with that server established through a single SIP dialog.
所有MRCPv2请求、响应和事件必须包含通道标识符标头字段。当控制通道添加到会话中并通过来自服务器的SDP应答中的“a=通道”属性与客户端通信时,该值由服务器分配。标题字段值由两部分组成,由“@”符号分隔。第一部分是一个明确的字符串,用于标识MRCPv2会话。第二部分是字符串标记,用于指定第3.1节中列出的媒体处理资源类型之一。明确的字符串(第一部分)必须难以猜测,在服务器管理的资源实例中是唯一的,并且对于通过单个SIP对话框建立的服务器的所有资源通道都是通用的。
channel-identifier = "Channel-Identifier" ":" channel-id CRLF channel-id = 1*alphanum "@" 1*alphanum
channel-identifier = "Channel-Identifier" ":" channel-id CRLF channel-id = 1*alphanum "@" 1*alphanum
The Accept header field follows the syntax defined in [H14.1]. The semantics are also identical, with the exception that if no Accept header field is present, the server MUST assume a default value that is specific to the resource type that is being controlled. This default value can be changed for a resource on a session by sending this header field in a SET-PARAMS method. The current default value of this header field for a resource in a session can be found through a GET-PARAMS method. This header field MAY occur on any request.
Accept header字段遵循[H14.1]中定义的语法。语义也相同,只是如果不存在Accept header字段,服务器必须采用特定于所控制的资源类型的默认值。通过在SET-PARAMS方法中发送此标头字段,可以更改会话上资源的此默认值。会话中资源的此标头字段的当前默认值可以通过GET-PARAMS方法找到。任何请求都可能出现此标题字段。
In a request, this header field indicates the list of request-ids to which the request applies. This is useful when there are multiple requests that are PENDING or IN-PROGRESS and the client wants this request to apply to one or more of these specifically.
在请求中,此标头字段指示应用请求的请求ID列表。当存在多个挂起或正在进行的请求,并且客户端希望此请求应用于其中一个或多个请求时,这非常有用。
In a response, this header field returns the list of request-ids that the method modified or affected. There could be one or more requests in a request-state of PENDING or IN-PROGRESS. When a method affecting one or more PENDING or IN-PROGRESS requests is sent from the client to the server, the response MUST contain the list of request-ids that were affected or modified by this command in its header section.
在响应中,此标头字段返回方法修改或影响的请求ID列表。可能有一个或多个请求处于挂起或正在进行的请求状态。当影响一个或多个挂起或正在进行的请求的方法从客户端发送到服务器时,响应必须在其头部分包含受此命令影响或修改的请求ID列表。
The Active-Request-Id-List is only used in requests and responses, not in events.
活动请求Id列表仅用于请求和响应,而不用于事件。
For example, if a STOP request with no Active-Request-Id-List is sent to a synthesizer resource that has one or more SPEAK requests in the PENDING or IN-PROGRESS state, all SPEAK requests MUST be cancelled, including the one IN-PROGRESS. The response to the STOP request contains in the Active-Request-Id-List value the request-ids of all the SPEAK requests that were terminated. After sending the STOP response, the server MUST NOT send any SPEAK-COMPLETE or RECOGNITION-COMPLETE events for the terminated requests.
例如,如果向合成器资源发送不带活动请求Id列表的停止请求,且合成器资源的一个或多个语音请求处于挂起或正在进行状态,则必须取消所有语音请求,包括正在进行的语音请求。对停止请求的响应在活动请求Id列表值中包含已终止的所有SPEAK请求的请求Id。发送停止响应后,服务器不得为终止的请求发送任何SPEAK-COMPLETE或RECOGNITION-COMPLETE事件。
active-request-id-list = "Active-Request-Id-List" ":" request-id *("," request-id) CRLF
活动请求id列表=“活动请求id列表”“:“请求id*(”,“请求id”)CRLF
When any server resource generates a "barge-in-able" event, it also generates a unique tag. The tag is sent as this header field's value in an event to the client. The client then acts as an intermediary among the server resources and sends a BARGE-IN-OCCURRED method to the synthesizer server resource with the Proxy-Sync-Id it received
当任何服务器资源生成“barge-in-able”事件时,它也会生成一个唯一的标记。标记在事件中作为此标头字段的值发送给客户端。然后,客户端充当服务器资源之间的中介,并使用接收到的代理同步Id向合成器服务器资源发送插入方法
from the server resource. When the recognizer and synthesizer resources are part of the same session, they may choose to work together to achieve quicker interaction and response. Here, the Proxy-Sync-Id helps the resource receiving the event, intermediated by the client, to decide if this event has been processed through a direct interaction of the resources. This header field MAY occur only on events and the BARGE-IN-OCCURRED method. The name of this header field contains the word 'proxy' only for historical reasons and does not imply that a proxy server is involved.
从服务器资源。当识别器和合成器资源是同一会话的一部分时,它们可以选择协同工作以实现更快的交互和响应。在这里,代理同步Id帮助接收事件的资源(由客户端作为中介)决定是否已通过资源的直接交互来处理此事件。该标题字段可能仅出现在事件和驳船就位方法中。此标题字段的名称包含“代理”一词仅出于历史原因,并不意味着涉及代理服务器。
proxy-sync-id = "Proxy-Sync-Id" ":" 1*VCHAR CRLF
proxy-sync-id = "Proxy-Sync-Id" ":" 1*VCHAR CRLF
See [H14.2]. This specifies the acceptable character sets for entities returned in the response or events associated with this request. This is useful in specifying the character set to use in the Natural Language Semantic Markup Language (NLSML) results of a RECOGNITION-COMPLETE event. This header field is only used on requests.
见[H14.2]。这为响应中返回的实体或与此请求关联的事件指定可接受的字符集。这在指定要在识别完成事件的自然语言语义标记语言(NLSML)结果中使用的字符集时非常有用。此标头字段仅用于请求。
See [H14.17]. MRCPv2 supports a restricted set of registered media types for content, including speech markup, grammar, and recognition results. The content types applicable to each MRCPv2 resource-type are specified in the corresponding section of the document and are registered in the MIME Media Types registry maintained by IANA. The multipart content type "multipart/mixed" is supported to communicate multiple of the above mentioned contents, in which case the body parts MUST NOT contain any MRCPv2-specific header fields. This header field MAY occur on all messages.
见[H14.17]。MRCPv2支持内容的一组受限注册媒体类型,包括语音标记、语法和识别结果。适用于每个MRCPv2资源类型的内容类型在文档的相应部分中指定,并在IANA维护的MIME媒体类型注册表中注册。支持多部分内容类型“multipart/mixed”来传递上述多个内容,在这种情况下,正文部分不得包含任何特定于MRCPv2的标题字段。此标头字段可能出现在所有消息上。
content-type = "Content-Type" ":" media-type-value CRLF
content type=“content type”“:”媒体类型值CRLF
media-type-value = type "/" subtype *( ";" parameter )
media-type-value = type "/" subtype *( ";" parameter )
type = token
type = token
subtype = token
subtype = token
parameter = attribute "=" value
参数=属性“=”值
attribute = token
attribute = token
value = token / quoted-string
value = token / quoted-string
This header field contains an ID or name for the content by which it can be referenced. This header field operates according to the specification in RFC 2392 [RFC2392] and is required for content disambiguation in multipart messages. In MRCPv2, whenever the associated content is stored by either the client or the server, it MUST be retrievable using this ID. Such content can be referenced later in a session by addressing it with the 'session' URI scheme described in Section 13.6. This header field MAY occur on all messages.
此标题字段包含可引用内容的ID或名称。此标头字段根据RFC 2392[RFC2392]中的规范进行操作,并且对于多部分消息中的内容消歧是必需的。在MRCPv2中,每当客户机或服务器存储相关内容时,必须使用此ID检索相关内容。稍后可以通过使用第13.6节中描述的“会话”URI方案对其进行寻址,从而在会话中引用此类内容。此标头字段可能出现在所有消息上。
The Content-Base entity-header MAY be used to specify the base URI for resolving relative URIs within the entity.
内容基实体头可用于指定用于解析实体内的相对URI的基URI。
content-base = "Content-Base" ":" absoluteURI CRLF
content base=“content base”“:”绝对URI CRLF
Note, however, that the base URI of the contents within the entity-body may be redefined within that entity-body. An example of this would be multipart media, which in turn can have multiple entities within it. This header field MAY occur on all messages.
然而,请注意,实体主体内内容的基本URI可以在该实体主体内重新定义。这方面的一个例子是多部分介质,而多部分介质又可以包含多个实体。此标头字段可能出现在所有消息上。
The Content-Encoding entity-header is used as a modifier to the Content-Type. When present, its value indicates what additional content encoding has been applied to the entity-body, and thus what decoding mechanisms must be applied in order to obtain the Media Type referenced by the Content-Type header field. Content-Encoding is primarily used to allow a document to be compressed without losing the identity of its underlying media type. Note that the SIP session can be used to determine accepted encodings (see Section 7). This header field MAY occur on all messages.
内容编码实体标头用作内容类型的修饰符。当存在时,其值指示对实体体应用了哪些附加内容编码,因此必须应用哪些解码机制才能获得内容类型标头字段引用的媒体类型。内容编码主要用于压缩文档而不丢失其底层媒体类型的标识。注意,SIP会话可用于确定接受的编码(参见第7节)。此标头字段可能出现在所有消息上。
content-encoding = "Content-Encoding" ":" *WSP content-coding *(*WSP "," *WSP content-coding *WSP ) CRLF
content-encoding = "Content-Encoding" ":" *WSP content-coding *(*WSP "," *WSP content-coding *WSP ) CRLF
Content codings are defined in [H3.5]. An example of its use is Content-Encoding:gzip
[H3.5]中定义了内容编码。它的一个使用示例是内容编码:gzip
If multiple encodings have been applied to an entity, the content encodings MUST be listed in the order in which they were applied.
如果对一个实体应用了多个编码,则必须按应用顺序列出内容编码。
The Content-Location entity-header MAY be used to supply the resource location for the entity enclosed in the message when that entity is accessible from a location separate from the requested resource's URI. Refer to [H14.14].
当可以从与请求的资源的URI分离的位置访问该实体时,可以使用内容位置实体报头来为消息中包含的实体提供资源位置。请参阅[H14.14]。
content-location = "Content-Location" ":" ( absoluteURI / relativeURI ) CRLF
content location=“content location”“:”(绝对URI/relativeURI)CRLF
The Content-Location value is a statement of the location of the resource corresponding to this particular entity at the time of the request. This header field is provided for optimization purposes only. The receiver of this header field MAY assume that the entity being sent is identical to what would have been retrieved or might already have been retrieved from the Content-Location URI.
Content Location值是请求时对应于该特定实体的资源位置的语句。此标题字段仅用于优化目的。此标头字段的接收者可能会假设发送的实体与从内容位置URI检索或可能已经检索到的实体相同。
For example, if the client provided a grammar markup inline, and it had previously retrieved it from a certain URI, that URI can be provided as part of the entity, using the Content-Location header field. This allows a resource like the recognizer to look into its cache to see if this grammar was previously retrieved, compiled, and cached. In this case, it might optimize by using the previously compiled grammar object.
例如,如果客户机内联提供了语法标记,并且它以前从某个URI中检索过该标记,则可以使用Content Location标头字段将该URI作为实体的一部分提供。这允许像识别器这样的资源查看其缓存,以查看该语法以前是否被检索、编译和缓存过。在这种情况下,它可能会使用以前编译的语法对象进行优化。
If the Content-Location is a relative URI, the relative URI is interpreted relative to the Content-Base URI. This header field MAY occur on all messages.
如果内容位置是相对URI,则相对URI将相对于内容库URI进行解释。此标头字段可能出现在所有消息上。
This header field contains the length of the content of the message body (i.e., after the double CRLF following the last header field). Unlike in HTTP, it MUST be included in all messages that carry content beyond the header section. If it is missing, a default value of zero is assumed. Otherwise, it is interpreted according to [H14.13]. When a message having no use for a message body contains one, i.e., the Content-Length is non-zero, the receiver MUST ignore the content of the message body. This header field MAY occur on all messages.
此标头字段包含消息正文内容的长度(即,在最后一个标头字段后面的双CRLF之后)。与HTTP不同的是,它必须包含在所有包含头部分以外内容的消息中。如果缺少,则假定默认值为零。否则,根据[H14.13]进行解释。当不使用消息体的消息包含一个消息体(即,内容长度非零)时,接收方必须忽略消息体的内容。此标头字段可能出现在所有消息上。
content-length = "Content-Length" ":" 1*19DIGIT CRLF
content-length = "Content-Length" ":" 1*19DIGIT CRLF
When the recognizer or synthesizer needs to fetch documents or other resources, this header field controls the corresponding URI access properties. This defines the timeout for content that the server may
当识别器或合成器需要获取文档或其他资源时,此标头字段控制相应的URI访问属性。这定义了服务器可能访问的内容的超时
need to fetch over the network. The value is interpreted to be in milliseconds and ranges from 0 to an implementation-specific maximum value. It is RECOMMENDED that servers be cautious about accepting long timeout values. The default value for this header field is implementation specific. This header field MAY occur in DEFINE-GRAMMAR, RECOGNIZE, SPEAK, SET-PARAMS, or GET-PARAMS.
需要通过网络获取。该值被解释为以毫秒为单位,范围从0到特定于实现的最大值。建议服务器在接受长超时值时要谨慎。此标头字段的默认值是特定于实现的。此标题字段可能出现在DEFINE-GRAMMAR、Recognite、SPEAK、SET-PARAMS或GET-PARAMS中。
fetch-timeout = "Fetch-Timeout" ":" 1*19DIGIT CRLF
fetch-timeout = "Fetch-Timeout" ":" 1*19DIGIT CRLF
If the server implements content caching, it MUST adhere to the cache correctness rules of HTTP 1.1 [RFC2616] when accessing and caching stored content. In particular, the "expires" and "cache-control" header fields of the cached URI or document MUST be honored and take precedence over the Cache-Control defaults set by this header field. The Cache-Control directives are used to define the default caching algorithms on the server for the session or request. The scope of the directive is based on the method it is sent on. If the directive is sent on a SET-PARAMS method, it applies for all requests for external documents the server makes during that session, unless it is overridden by a Cache-Control header field on an individual request. If the directives are sent on any other requests, they apply only to external document requests the server makes for that request. An empty Cache-Control header field on the GET-PARAMS method is a request for the server to return the current Cache-Control directives setting on the server. This header field MAY occur only on requests.
如果服务器实现了内容缓存,那么在访问和缓存存储的内容时,它必须遵守HTTP 1.1[RFC2616]的缓存正确性规则。特别是,缓存URI或文档的“expires”和“cache control”头字段必须优先于此头字段设置的缓存控制默认值。缓存控制指令用于定义服务器上会话或请求的默认缓存算法。指令的作用域基于它发送的方法。如果该指令是通过SET-PARAMS方法发送的,则该指令适用于服务器在该会话期间发出的所有外部文档请求,除非该指令被单个请求上的缓存控制标头字段覆盖。如果指令是在任何其他请求上发送的,则它们仅适用于服务器为该请求发出的外部文档请求。GET-PARAMS方法上的空缓存控制标头字段是服务器返回服务器上当前缓存控制指令设置的请求。此标头字段只能在请求时出现。
cache-control = "Cache-Control" ":" [*WSP cache-directive *( *WSP "," *WSP cache-directive *WSP )] CRLF
cache-control = "Cache-Control" ":" [*WSP cache-directive *( *WSP "," *WSP cache-directive *WSP )] CRLF
cache-directive = "max-age" "=" delta-seconds / "max-stale" [ "=" delta-seconds ] / "min-fresh" "=" delta-seconds
cache-directive = "max-age" "=" delta-seconds / "max-stale" [ "=" delta-seconds ] / "min-fresh" "=" delta-seconds
delta-seconds = 1*19DIGIT
delta-seconds = 1*19DIGIT
Here, delta-seconds is a decimal time value specifying the number of seconds since the instant the message response or data was received by the server.
这里,delta seconds是一个十进制时间值,指定自服务器收到消息响应或数据的那一刻起的秒数。
The different cache-directive options allow the client to ask the server to override the default cache expiration mechanisms:
不同的缓存指令选项允许客户端请求服务器覆盖默认缓存过期机制:
max-age Indicates that the client can tolerate the server using content whose age is no greater than the specified time in seconds. Unless a "max-stale" directive is also included, the client is not willing to accept a response based on stale data.
max age表示客户端可以容忍服务器使用不超过指定时间(以秒为单位)的内容。除非还包含“max stale”指令,否则客户端不愿意接受基于过时数据的响应。
min-fresh Indicates that the client is willing to accept a server response with cached data whose expiration is no less than its current age plus the specified time in seconds. If the server's cache time-to-live exceeds the client-supplied min-fresh value, the server MUST NOT utilize cached content.
min fresh表示客户端愿意接受缓存数据的服务器响应,缓存数据的过期时间不小于其当前时间加上以秒为单位的指定时间。如果服务器的缓存生存时间超过客户端提供的最小新鲜值,则服务器不得使用缓存内容。
max-stale Indicates that the client is willing to allow a server to utilize cached data that has exceeded its expiration time. If "max-stale" is assigned a value, then the client is willing to allow the server to use cached data that has exceeded its expiration time by no more than the specified number of seconds. If no value is assigned to "max-stale", then the client is willing to allow the server to use stale data of any age.
max stale表示客户机愿意允许服务器使用超过其过期时间的缓存数据。如果为“max stale”分配了一个值,则客户端愿意允许服务器使用超出其过期时间不超过指定秒数的缓存数据。如果没有为“max stale”分配任何值,则客户端愿意允许服务器使用任何时间段的过时数据。
If the server cache is requested to use stale response/data without validation, it MAY do so only if this does not conflict with any "MUST"-level requirements concerning cache validation (e.g., a "must-revalidate" Cache-Control directive in the HTTP 1.1 specification pertaining to the corresponding URI).
如果服务器缓存被请求在未经验证的情况下使用过时的响应/数据,则只有在与缓存验证相关的任何“必须”级别要求(例如,HTTP 1.1规范中与相应URI相关的“必须重新验证”缓存控制指令)不冲突的情况下,服务器才会这样做。
If both the MRCPv2 Cache-Control directive and the cached entry on the server include "max-age" directives, then the lesser of the two values is used for determining the freshness of the cached entry for that request.
如果服务器上的MRCPv2缓存控制指令和缓存项都包含“max age”指令,则两个值中的较小值用于确定该请求的缓存项的新鲜度。
This header field MAY be sent as part of a SET-PARAMS/GET-PARAMS method to set or retrieve the logging tag for logs generated by the server. Once set, the value persists until a new value is set or the session ends. The MRCPv2 server MAY provide a mechanism to create subsets of its output logs so that system administrators can examine or extract only the log file portion during which the logging tag was set to a certain value.
此标头字段可以作为SET-PARAMS/GET-PARAMS方法的一部分发送,以设置或检索服务器生成的日志的日志标记。一旦设置,该值将持续存在,直到设置新值或会话结束。MRCPv2服务器可以提供一种机制来创建其输出日志的子集,以便系统管理员可以仅检查或提取日志文件部分,在此期间日志标记被设置为特定值。
It is RECOMMENDED that clients include in the logging tag information to identify the MRCPv2 client User Agent, so that one can determine which MRCPv2 client request generated a given log message at the server. It is also RECOMMENDED that MRCPv2 clients not log
建议客户机在日志标记信息中包含MRCPv2客户机用户代理,以便确定哪个MRCPv2客户机请求在服务器上生成了给定的日志消息。还建议MRCPv2客户端不要登录
personally identifiable information such as credit card numbers and national identification numbers.
个人识别信息,如信用卡号和国家识别号。
logging-tag = "Logging-Tag" ":" 1*UTFCHAR CRLF
logging-tag = "Logging-Tag" ":" 1*UTFCHAR CRLF
Since the associated HTTP client on an MRCPv2 server fetches documents for processing on behalf of the MRCPv2 client, the cookie store in the HTTP client of the MRCPv2 server is treated as an extension of the cookie store in the HTTP client of the MRCPv2 client. This requires that the MRCPv2 client and server be able to synchronize their common cookie store as needed. To enable the MRCPv2 client to push its stored cookies to the MRCPv2 server and get new cookies from the MRCPv2 server stored back to the MRCPv2 client, the Set-Cookie entity-header field MAY be included in MRCPv2 requests to update the cookie store on a server and be returned in final MRCPv2 responses or events to subsequently update the client's own cookie store. The stored cookies on the server persist for the duration of the MRCPv2 session and MUST be destroyed at the end of the session. To ensure support for cookies, MRCPv2 clients and servers MUST support the Set-Cookie entity-header field.
由于MRCPv2服务器上关联的HTTP客户端代表MRCPv2客户端获取文档进行处理,因此MRCPv2服务器HTTP客户端中的cookie存储被视为MRCPv2客户端HTTP客户端中cookie存储的扩展。这要求MRCPv2客户端和服务器能够根据需要同步它们的公共cookie存储。要使MRCPv2客户端能够将其存储的cookie推送到MRCPv2服务器,并从MRCPv2服务器获取新cookie,并将其存储回MRCPv2客户端,设置Cookie实体标头字段可能包含在MRCPv2请求中,以更新服务器上的Cookie存储,并在最终MRCPv2响应或事件中返回,以随后更新客户端自己的Cookie存储。服务器上存储的Cookie在MRCPv2会话期间保持不变,并且必须在会话结束时销毁。为了确保对Cookie的支持,MRCPv2客户端和服务器必须支持Set Cookie entity header字段。
Note that it is the MRCPv2 client that determines which, if any, cookies are sent to the server. There is no requirement that all cookies be shared. Rather, it is RECOMMENDED that MRCPv2 clients communicate only cookies needed by the MRCPv2 server to process its requests.
请注意,MRCPv2客户机决定向服务器发送哪些cookie(如果有的话)。不要求共享所有cookie。相反,建议MRCPv2客户端只通信MRCPv2服务器处理其请求所需的cookie。
set-cookie = "Set-Cookie:" cookies CRLF cookies = cookie *("," *LWS cookie) cookie = attribute "=" value *(";" cookie-av) cookie-av = "Comment" "=" value / "Domain" "=" value / "Max-Age" "=" value / "Path" "=" value / "Secure" / "Version" "=" 1*19DIGIT / "Age" "=" delta-seconds
set-cookie = "Set-Cookie:" cookies CRLF cookies = cookie *("," *LWS cookie) cookie = attribute "=" value *(";" cookie-av) cookie-av = "Comment" "=" value / "Domain" "=" value / "Max-Age" "=" value / "Path" "=" value / "Secure" / "Version" "=" 1*19DIGIT / "Age" "=" delta-seconds
set-cookie = "Set-Cookie:" SP set-cookie-string set-cookie-string = cookie-pair *( ";" SP cookie-av ) cookie-pair = cookie-name "=" cookie-value cookie-name = token cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE ) cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E token = <token, defined in [RFC2616], Section 2.2>
set-cookie = "Set-Cookie:" SP set-cookie-string set-cookie-string = cookie-pair *( ";" SP cookie-av ) cookie-pair = cookie-name "=" cookie-value cookie-name = token cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE ) cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E token = <token, defined in [RFC2616], Section 2.2>
cookie-av = expires-av / max-age-av / domain-av / path-av / secure-av / httponly-av / extension-av / age-av expires-av = "Expires=" sane-cookie-date sane-cookie-date = <rfc1123-date, defined in [RFC2616], Section 3.3.1> max-age-av = "Max-Age=" non-zero-digit *DIGIT non-zero-digit = %x31-39 domain-av = "Domain=" domain-value domain-value = <subdomain> path-av = "Path=" path-value path-value = <any CHAR except CTLs or ";"> secure-av = "Secure" httponly-av = "HttpOnly" extension-av = <any CHAR except CTLs or ";"> age-av = "Age=" delta-seconds
cookie-av = expires-av / max-age-av / domain-av / path-av / secure-av / httponly-av / extension-av / age-av expires-av = "Expires=" sane-cookie-date sane-cookie-date = <rfc1123-date, defined in [RFC2616], Section 3.3.1> max-age-av = "Max-Age=" non-zero-digit *DIGIT non-zero-digit = %x31-39 domain-av = "Domain=" domain-value domain-value = <subdomain> path-av = "Path=" path-value path-value = <any CHAR except CTLs or ";"> secure-av = "Secure" httponly-av = "HttpOnly" extension-av = <any CHAR except CTLs or ";"> age-av = "Age=" delta-seconds
The Set-Cookie header field is specified in RFC 6265 [RFC6265]. The "Age" attribute is introduced in this specification to indicate the age of the cookie and is OPTIONAL. An MRCPv2 client or server MUST calculate the age of the cookie according to the age calculation rules in the HTTP/1.1 specification [RFC2616] and append the "Age" attribute accordingly. This attribute is provided because time may have passed since the client received the cookie from an HTTP server. Rather than having the client reduce Max-Age by the actual age, it passes Max-Age verbatim and appends the "Age" attribute, thus maintaining the cookie as received while still accounting for the fact that time has passed.
设置Cookie头字段在RFC 6265[RFC6265]中指定。本规范中引入了“Age”属性以指示cookie的年龄,该属性是可选的。MRCPv2客户端或服务器必须根据HTTP/1.1规范[RFC2616]中的年龄计算规则计算cookie的年龄,并相应地附加“年龄”属性。之所以提供此属性,是因为自客户端从HTTP服务器接收cookie以来,时间可能已经过去了。它不是让客户机按实际年龄减少Max Age,而是逐字传递Max Age并附加“Age”属性,从而保持cookie的接收状态,同时仍然考虑到时间已经过去的事实。
The MRCPv2 client or server MUST supply defaults for the "Domain" and "Path" attributes, as specified in RFC 6265, if they are omitted by the HTTP origin server. Note that there is no leading dot present in the "Domain" attribute value in this case. Although an explicitly specified "Domain" value received via the HTTP protocol may be modified to include a leading dot, an MRCPv2 client or server MUST NOT modify the "Domain" value when received via the MRCPv2 protocol.
MRCPv2客户机或服务器必须提供RFC 6265中指定的“域”和“路径”属性的默认值(如果HTTP源服务器忽略了它们)。请注意,在这种情况下,“域”属性值中没有前导点。尽管通过HTTP协议接收的明确指定的“域”值可能会被修改为包含前导点,但当通过MRCPv2协议接收时,MRCPv2客户端或服务器不得修改“域”值。
An MRCPv2 client or server MAY combine multiple cookie header fields of the same type into a single "field-name:field-value" pair as described in Section 6.2.
如第6.2节所述,MRCPv2客户端或服务器可以将相同类型的多个cookie头字段组合成一个“字段名称:字段值”对。
The Set-Cookie header field MAY be specified in any request that subsequently results in the server performing an HTTP access. When a server receives new cookie information from an HTTP origin server, and assuming the cookie store is modified according to RFC 6265, the server MUST return the new cookie information in the MRCPv2 COMPLETE response or event, as appropriate, to allow the client to update its own cookie store.
可以在随后导致服务器执行HTTP访问的任何请求中指定Set Cookie header字段。当服务器从HTTP源服务器接收到新的cookie信息时,并且假设cookie存储已根据RFC 6265进行了修改,则服务器必须在MRCPv2完整响应或事件(视情况而定)中返回新的cookie信息,以允许客户端更新其自己的cookie存储。
The SET-PARAMS request MAY specify the Set-Cookie header field to update the cookie store on a server. The GET-PARAMS request MAY be used to return the entire cookie store of "Set-Cookie" type cookies to the client.
SET-PARAMS请求可以指定SET-Cookie头字段来更新服务器上的Cookie存储。GET-PARAMS请求可用于将“Set cookie”类型cookie的整个cookie存储返回给客户端。
This set of header fields allows for the client to set or retrieve vendor-specific parameters.
这组标题字段允许客户端设置或检索特定于供应商的参数。
vendor-specific = "Vendor-Specific-Parameters" ":" [vendor-specific-av-pair *(";" vendor-specific-av-pair)] CRLF
特定于供应商=“特定于供应商的参数”:“[特定于供应商的av对*(“;“特定于供应商的av对)]CRLF
vendor-specific-av-pair = vendor-av-pair-name "=" value
供应商特定av对=供应商av对名称“=”值
vendor-av-pair-name = 1*UTFCHAR
vendor-av-pair-name = 1*UTFCHAR
Header fields of this form MAY be sent in any method (request) and are used to manage implementation-specific parameters on the server side. The vendor-av-pair-name follows the reverse Internet Domain Name convention (see Section 13.1.6 for syntax and registration information). The value of the vendor attribute is specified after the "=" symbol and MAY be quoted. For example:
此表单的头字段可以通过任何方法(请求)发送,并用于管理服务器端的特定于实现的参数。供应商av对名称遵循反向互联网域名约定(语法和注册信息见第13.1.6节)。供应商属性的值在“=”符号后指定,可以引用。例如:
com.example.companyA.paramxyz=256 com.example.companyA.paramabc=High com.example.companyB.paramxyz=Low
com.example.companyA.paramxyz=256 com.example.companyA.paramabc=High com.example.companyB.paramxyz=Low
When used in GET-PARAMS to get the current value of these parameters from the server, this header field value MAY contain a semicolon-separated list of implementation-specific attribute names.
当在GET-PARAMS中用于从服务器获取这些参数的当前值时,此标头字段值可能包含一个分号分隔的实现特定属性名称列表。
Result data from the server for the Recognizer and Verifier resources is carried as a typed media entity in the MRCPv2 message body of various events. The Natural Language Semantics Markup Language (NLSML), an XML markup based on an early draft from the W3C, is the default standard for returning results back to the client. Hence, all servers implementing these resource types MUST support the media type 'application/nlsml+xml'. The Extensible MultiModal Annotation (EMMA) [W3C.REC-emma-20090210] format can be used to return results as well. This can be done by negotiating the format at session establishment time with SDP (a=resultformat:application/emma+xml) or with SIP (Allow/Accept). With SIP, for example, if a client wants
来自识别器和验证器资源服务器的结果数据作为类型化媒体实体携带在各种事件的MRCPv2消息体中。自然语言语义标记语言(NLSML)是一种基于W3C早期草案的XML标记,是将结果返回给客户端的默认标准。因此,实现这些资源类型的所有服务器都必须支持媒体类型“application/nlsml+xml”。可扩展多模式注释(EMMA)[W3C.REC-EMMA-20090210]格式也可用于返回结果。这可以通过在会话建立时与SDP(a=resultformat:application/emma+xml)或SIP(Allow/Accept)协商格式来实现。例如,使用SIP,如果客户端需要
results in EMMA, an MRCPv2 server can route the request to another server that supports EMMA by inspecting the SIP header fields, rather than having to inspect the SDP.
结果在EMMA中,MRCPv2服务器可以通过检查SIP头字段将请求路由到另一个支持EMMA的服务器,而不必检查SDP。
MRCPv2 uses this representation to convey content among the clients and servers that generate and make use of the markup. MRCPv2 uses NSLML specifically to convey recognition, enrollment, and verification results between the corresponding resource on the MRCPv2 server and the MRCPv2 client. Details of this result format are fully described in Section 6.3.1.
MRCPv2使用此表示在生成和使用标记的客户端和服务器之间传递内容。MRCPv2专门使用NSLML在MRCPv2服务器和MRCPv2客户端上的相应资源之间传递识别、注册和验证结果。第6.3.1节详细描述了该结果格式。
Content-Type:application/nlsml+xml Content-Length:...
内容类型:应用程序/nlsml+xml内容长度:。。。
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:ex="http://www.example.com/example" grammar="http://theYesNoGrammar"> <interpretation> <instance> <ex:response>yes</ex:response> </instance> <input>OK</input> </interpretation> </result>
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:ex="http://www.example.com/example" grammar="http://theYesNoGrammar"> <interpretation> <instance> <ex:response>yes</ex:response> </instance> <input>OK</input> </interpretation> </result>
Result Example
结果示例
The Natural Language Semantics Markup Language (NLSML) is an XML data structure with elements and attributes designed to carry result information from recognizer (including enrollment) and verifier resources. The normative definition of NLSML is the RelaxNG schema in Section 16.1. Note that the elements and attributes of this format are defined in the MRCPv2 namespace. In the result structure, they must either be prefixed by a namespace prefix declared within the result or must be children of an element identified as belonging to the respective namespace. For details on how to use XML Namespaces, see [W3C.REC-xml-names11-20040204]. Section 2 of [W3C.REC-xml-names11-20040204] provides details on how to declare namespaces and namespace prefixes.
自然语言语义标记语言(Natural Language Semantics Markup Language,NLSML)是一种XML数据结构,其元素和属性旨在携带来自识别器(包括注册)和验证器资源的结果信息。NLSML的规范性定义为第16.1节中的松弛模式。请注意,此格式的元素和属性在MRCPv2命名空间中定义。在结果结构中,它们必须以结果中声明的名称空间前缀作为前缀,或者必须是标识为属于相应名称空间的元素的子元素。有关如何使用XML名称空间的详细信息,请参阅[W3C.REC-XML-names11-20040204]。[W3C.REC-xml-names11-20040204]的第2节提供了有关如何声明名称空间和名称空间前缀的详细信息。
The root element of NLSML is <result>. Optional child elements are <interpretation>, <enrollment-result>, and <verification-result>, at least one of which must be present. A single <result> MAY contain any or all of the optional child elements. Details of the <result> and <interpretation> elements and their subelements and attributes
The root element of NLSML is <result>. Optional child elements are <interpretation>, <enrollment-result>, and <verification-result>, at least one of which must be present. A single <result> MAY contain any or all of the optional child elements. Details of the <result> and <interpretation> elements and their subelements and attributes
can be found in Section 9.6. Details of the <enrollment-result> element and its subelements can be found in Section 9.7. Details of the <verification-result> element and its subelements can be found in Section 11.5.2.
见第9.6节。有关<enrollment result>元素及其子元素的详细信息,请参见第9.7节。<verification result>元素及其子元素的详细信息见第11.5.2节。
Server resources may be discovered and their capabilities learned by clients through standard SIP machinery. The client MAY issue a SIP OPTIONS transaction to a server, which has the effect of requesting the capabilities of the server. The server MUST respond to such a request with an SDP-encoded description of its capabilities according to RFC 3264 [RFC3264]. The MRCPv2 capabilities are described by a single "m=" line containing the media type "application" and transport type "TCP/TLS/MRCPv2" or "TCP/MRCPv2". There MUST be one "resource" attribute for each media resource that the server supports, and it has the resource type identifier as its value.
客户端可以通过标准SIP机制发现服务器资源并了解其功能。客户端可以向服务器发出SIP选项事务,这具有请求服务器功能的效果。根据RFC 3264[RFC3264],服务器必须使用SDP编码的能力描述来响应此类请求。MRCPv2功能由单个“m=”行描述,其中包含媒体类型“应用程序”和传输类型“TCP/TLS/MRCPv2”或“TCP/MRCPv2”。对于服务器支持的每个媒体资源,必须有一个“资源”属性,并且该属性的值为资源类型标识符。
The SDP description MUST also contain "m=" lines describing the audio capabilities and the coders the server supports.
SDP描述还必须包含描述服务器支持的音频功能和编码器的“m=”行。
In this example, the client uses the SIP OPTIONS method to query the capabilities of the MRCPv2 server.
在本例中,客户机使用SIP OPTIONS方法查询MRCPv2服务器的功能。
C->S: OPTIONS sip:mrcp@server.example.com SIP/2.0 Via:SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf7 Max-Forwards:6 To:<sip:mrcp@example.com> From:Sarvi <sip:sarvi@example.com>;tag=1928301774 Call-ID:a84b4c76e66710 CSeq:63104 OPTIONS Contact:<sip:sarvi@client.example.com> Accept:application/sdp Content-Length:0
C->S: OPTIONS sip:mrcp@server.example.com SIP/2.0 Via:SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf7 Max-Forwards:6 To:<sip:mrcp@example.com> From:Sarvi <sip:sarvi@example.com>;tag=1928301774 Call-ID:a84b4c76e66710 CSeq:63104 OPTIONS Contact:<sip:sarvi@client.example.com> Accept:application/sdp Content-Length:0
S->C: SIP/2.0 200 OK Via:SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf7;received=192.0.32.10 To:<sip:mrcp@example.com>;tag=62784 From:Sarvi <sip:sarvi@example.com>;tag=1928301774 Call-ID:a84b4c76e66710 CSeq:63104 OPTIONS Contact:<sip:mrcp@server.example.com> Allow:INVITE, ACK, CANCEL, OPTIONS, BYE
S->C: SIP/2.0 200 OK Via:SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf7;received=192.0.32.10 To:<sip:mrcp@example.com>;tag=62784 From:Sarvi <sip:sarvi@example.com>;tag=1928301774 Call-ID:a84b4c76e66710 CSeq:63104 OPTIONS Contact:<sip:mrcp@server.example.com> Allow:INVITE, ACK, CANCEL, OPTIONS, BYE
Accept:application/sdp Accept-Encoding:gzip Accept-Language:en Supported:foo Content-Type:application/sdp Content-Length:...
Accept:application/sdp Accept编码:gzip Accept语言:en支持:foo内容类型:application/sdp内容长度:。。。
v=0 o=sarvi 2890844536 2890842811 IN IP4 192.0.2.12 s=- i=MRCPv2 server capabilities c=IN IP4 192.0.2.12/127 t=0 0 m=application 0 TCP/TLS/MRCPv2 1 a=resource:speechsynth a=resource:speechrecog a=resource:speakverify m=audio 0 RTP/AVP 0 3 a=rtpmap:0 PCMU/8000 a=rtpmap:3 GSM/8000
v=0 o=sarvi 2890844536 2890842811 IN IP4 192.0.2.12 s=- i=MRCPv2 server capabilities c=IN IP4 192.0.2.12/127 t=0 0 m=application 0 TCP/TLS/MRCPv2 1 a=resource:speechsynth a=resource:speechrecog a=resource:speakverify m=audio 0 RTP/AVP 0 3 a=rtpmap:0 PCMU/8000 a=rtpmap:3 GSM/8000
Using SIP OPTIONS for MRCPv2 Server Capability Discovery
使用SIP选项进行MRCPv2服务器功能发现
This resource processes text markup provided by the client and generates a stream of synthesized speech in real time. Depending upon the server implementation and capability of this resource, the client can also dictate parameters of the synthesized speech such as voice characteristics, speaker speed, etc.
此资源处理客户端提供的文本标记,并实时生成合成语音流。根据服务器实现和该资源的能力,客户端还可以指定合成语音的参数,如语音特征、说话人速度等。
The synthesizer resource is controlled by MRCPv2 requests from the client. Similarly, the resource can respond to these requests or generate asynchronous events to the client to indicate conditions of interest to the client during the generation of the synthesized speech stream.
合成器资源由来自客户端的MRCPv2请求控制。类似地,资源可以响应这些请求或向客户端生成异步事件,以指示在生成合成语音流期间客户端感兴趣的条件。
This section applies for the following resource types:
本节适用于以下资源类型:
o speechsynth
o 演讲合成器
o basicsynth
o 基本同步
The capabilities of these resources are defined in Section 3.1.
第3.1节定义了这些资源的能力。
The synthesizer maintains a state machine to process MRCPv2 requests from the client. The state transitions shown below describe the states of the synthesizer and reflect the state of the request at the head of the synthesizer resource queue. A SPEAK request in the PENDING state can be deleted or stopped by a STOP request without affecting the state of the resource.
合成器维护一个状态机来处理来自客户端的MRCPv2请求。下面显示的状态转换描述合成器的状态,并反映合成器资源队列头部的请求状态。处于挂起状态的SPEAK请求可以被STOP请求删除或停止,而不会影响资源的状态。
Idle Speaking Paused State State State | | | |----------SPEAK-------->| |--------| |<------STOP-------------| CONTROL | |<----SPEAK-COMPLETE-----| |------->| |<----BARGE-IN-OCCURRED--| | | |---------| | | CONTROL |-----------PAUSE--------->| | |-------->|<----------RESUME---------| | | |----------| |----------| | PAUSE | | BARGE-IN-OCCURRED | |--------->| |<---------| |----------| | | | SPEECH-MARKER | | |<---------| | |----------| |----------| | | STOP | RESUME | | | |<---------| | |<---------| | | |<---------------------STOP-------------------------| |----------| | | | DEFINE-LEXICON | | | | | | |<---------| | | |<---------------BARGE-IN-OCCURRED------------------|
Idle Speaking Paused State State State | | | |----------SPEAK-------->| |--------| |<------STOP-------------| CONTROL | |<----SPEAK-COMPLETE-----| |------->| |<----BARGE-IN-OCCURRED--| | | |---------| | | CONTROL |-----------PAUSE--------->| | |-------->|<----------RESUME---------| | | |----------| |----------| | PAUSE | | BARGE-IN-OCCURRED | |--------->| |<---------| |----------| | | | SPEECH-MARKER | | |<---------| | |----------| |----------| | | STOP | RESUME | | | |<---------| | |<---------| | | |<---------------------STOP-------------------------| |----------| | | | DEFINE-LEXICON | | | | | | |<---------| | | |<---------------BARGE-IN-OCCURRED------------------|
Synthesizer State Machine
合成器状态机
The synthesizer supports the following methods.
合成器支持以下方法。
synthesizer-method = "SPEAK" / "STOP" / "PAUSE" / "RESUME" / "BARGE-IN-OCCURRED" / "CONTROL" / "DEFINE-LEXICON"
synthesizer-method = "SPEAK" / "STOP" / "PAUSE" / "RESUME" / "BARGE-IN-OCCURRED" / "CONTROL" / "DEFINE-LEXICON"
The synthesizer can generate the following events.
合成器可以生成以下事件。
synthesizer-event = "SPEECH-MARKER" / "SPEAK-COMPLETE"
合成器事件=“语音标记”/“讲话完成”
A synthesizer method can contain header fields containing request options and information to augment the Request, Response, or Event it is associated with.
合成器方法可以包含包含请求选项和信息的头字段,以增加与之关联的请求、响应或事件。
synthesizer-header = jump-size / kill-on-barge-in / speaker-profile / completion-cause / completion-reason / voice-parameter / prosody-parameter / speech-marker / speech-language / fetch-hint / audio-fetch-hint / failed-uri / failed-uri-cause / speak-restart / speak-length / load-lexicon / lexicon-search-order
synthesizer-header = jump-size / kill-on-barge-in / speaker-profile / completion-cause / completion-reason / voice-parameter / prosody-parameter / speech-marker / speech-language / fetch-hint / audio-fetch-hint / failed-uri / failed-uri-cause / speak-restart / speak-length / load-lexicon / lexicon-search-order
This header field MAY be specified in a CONTROL method and controls the amount to jump forward or backward in an active SPEAK request. A '+' or '-' indicates a relative value to what is being currently played. This header field MAY also be specified in a SPEAK request as a desired offset into the synthesized speech. In this case, the synthesizer MUST begin speaking from this amount of time into the speech markup. Note that an offset that extends beyond the end of
该报头字段可以在控制方法中指定,并控制活动语音请求中向前或向后跳转的量。“+”或“-”表示当前播放内容的相对值。该报头字段也可以在语音请求中指定为合成语音的期望偏移量。在这种情况下,合成器必须从这段时间开始对语音标记说话。请注意,偏移量超出了
the produced speech will result in audio of length zero. The different speech length units supported are dependent on the synthesizer implementation. If the synthesizer resource does not support a unit for the operation, the resource MUST respond with a status-code of 409 "Unsupported Header Field Value".
产生的语音将产生长度为零的音频。支持的不同语音长度单位取决于合成器的实现。如果合成器资源不支持操作单元,则资源必须以409“Unsupported Header Field Value”(不支持的标题字段值)的状态代码响应。
jump-size = "Jump-Size" ":" speech-length-value CRLF
jump size=“jump size”“:”语音长度值CRLF
speech-length-value = numeric-speech-length / text-speech-length
语音长度值=数字语音长度/文本语音长度
text-speech-length = 1*UTFCHAR SP "Tag"
text-speech-length = 1*UTFCHAR SP "Tag"
numeric-speech-length = ("+" / "-") positive-speech-length
numeric-speech-length = ("+" / "-") positive-speech-length
positive-speech-length = 1*19DIGIT SP numeric-speech-unit
positive-speech-length = 1*19DIGIT SP numeric-speech-unit
numeric-speech-unit = "Second" / "Word" / "Sentence" / "Paragraph"
numeric-speech-unit = "Second" / "Word" / "Sentence" / "Paragraph"
This header field MAY be sent as part of the SPEAK method to enable "kill-on-barge-in" support. If enabled, the SPEAK method is interrupted by DTMF input detected by a signal detector resource or by the start of speech sensed or recognized by the speech recognizer resource.
该标题字段可作为SPEAK方法的一部分发送,以启用“驳船压井”支持。如果启用,语音方法将被信号检测器资源检测到的DTMF输入或语音识别器资源感测或识别的语音开始中断。
kill-on-barge-in = "Kill-On-Barge-In" ":" BOOLEAN CRLF
在驳船上压井=“在驳船上压井”:“布尔CRLF
The client MUST send a BARGE-IN-OCCURRED method to the synthesizer resource when it receives a barge-in-able event from any source. This source could be a synthesizer resource or signal detector resource and MAY be either local or distributed. If this header field is not specified in a SPEAK request or explicitly set by a SET-PARAMS, the default value for this header field is "true".
当客户机从任何来源接收到驳船入站事件时,必须向合成器资源发送驳船入站方法。该源可以是合成器资源或信号检测器资源,并且可以是本地的或分布式的。如果此标头字段未在SPEAK请求中指定或未由set-PARAMS显式设置,则此标头字段的默认值为“true”。
If the recognizer or signal detector resource is on the same server as the synthesizer and both are part of the same session, the server MAY work with both to provide internal notification to the synthesizer so that audio may be stopped without having to wait for the client's BARGE-IN-OCCURRED event.
如果识别器或信号检测器资源与合成器位于同一台服务器上,并且两者都是同一会话的一部分,则服务器可以与两者一起工作,以向合成器提供内部通知,从而可以停止音频,而无需等待客户端的插入事件。
It is generally RECOMMENDED when playing a prompt to the user with Kill-On-Barge-In and asking for input, that the client issue the RECOGNIZE request ahead of the SPEAK request for optimum performance
通常建议,当在驳船上使用Kill向用户发出提示并请求输入时,客户机在发出SPEAK请求之前发出RECOGNIZE请求,以获得最佳性能
and user experience. This way, it is guaranteed that the recognizer is online before the prompt starts playing and the user's speech will not be truncated at the beginning (especially for power users).
和用户体验。这样,可以保证在提示符开始播放之前识别器处于在线状态,并且用户的语音不会在开始时被截断(特别是对于超级用户)。
This header field MAY be part of the SET-PARAMS/GET-PARAMS or SPEAK request from the client to the server and specifies a URI that references the profile of the speaker. Speaker profiles are collections of voice parameters like gender, accent, etc.
此标头字段可能是从客户端到服务器的SET-PARAMS/GET-PARAMS或SPEAK请求的一部分,并指定引用说话人配置文件的URI。扬声器配置文件是语音参数的集合,如性别、口音等。
speaker-profile = "Speaker-Profile" ":" uri CRLF
speaker profile=“speaker profile”“:”uri CRLF
This header field MUST be specified in a SPEAK-COMPLETE event coming from the synthesizer resource to the client. This indicates the reason the SPEAK request completed.
必须在从合成器资源到客户端的SPEAK-COMPLETE事件中指定此标头字段。这表示SPEAK请求完成的原因。
completion-cause = "Completion-Cause" ":" 3DIGIT SP 1*VCHAR CRLF
完成原因=“完成原因”:“3DIGIT SP 1*VCHAR CRLF
+------------+-----------------------+------------------------------+ | Cause-Code | Cause-Name | Description | +------------+-----------------------+------------------------------+ | 000 | normal | SPEAK completed normally. | | 001 | barge-in | SPEAK request was terminated | | | | because of barge-in. | | 002 | parse-failure | SPEAK request terminated | | | | because of a failure to | | | | parse the speech markup | | | | text. | | 003 | uri-failure | SPEAK request terminated | | | | because access to one of the | | | | URIs failed. | | 004 | error | SPEAK request terminated | | | | prematurely due to | | | | synthesizer error. | | 005 | language-unsupported | Language not supported. | | 006 | lexicon-load-failure | Lexicon loading failed. | | 007 | cancelled | A prior SPEAK request failed | | | | while this one was still in | | | | the queue. | +------------+-----------------------+------------------------------+
+------------+-----------------------+------------------------------+ | Cause-Code | Cause-Name | Description | +------------+-----------------------+------------------------------+ | 000 | normal | SPEAK completed normally. | | 001 | barge-in | SPEAK request was terminated | | | | because of barge-in. | | 002 | parse-failure | SPEAK request terminated | | | | because of a failure to | | | | parse the speech markup | | | | text. | | 003 | uri-failure | SPEAK request terminated | | | | because access to one of the | | | | URIs failed. | | 004 | error | SPEAK request terminated | | | | prematurely due to | | | | synthesizer error. | | 005 | language-unsupported | Language not supported. | | 006 | lexicon-load-failure | Lexicon loading failed. | | 007 | cancelled | A prior SPEAK request failed | | | | while this one was still in | | | | the queue. | +------------+-----------------------+------------------------------+
Synthesizer Resource Completion Cause Codes
合成器资源完成原因代码
This header field MAY be specified in a SPEAK-COMPLETE event coming from the synthesizer resource to the client. This contains the reason text behind the SPEAK request completion. This header field communicates text describing the reason for the failure, such as an error in parsing the speech markup text.
此标头字段可以在从合成器资源到客户端的SPEAK-COMPLETE事件中指定。这包含发言请求完成后的原因文本。此标题字段传递描述失败原因的文本,例如解析语音标记文本时出错。
completion-reason = "Completion-Reason" ":" quoted-string CRLF
完成原因=“完成原因”“:”带引号的字符串CRLF
The completion reason text is provided for client use in logs and for debugging and instrumentation purposes. Clients MUST NOT interpret the completion reason text.
完成原因文本用于日志中的客户端使用以及调试和检测目的。客户不得解释完成原因文本。
This set of header fields defines the voice of the speaker.
这组标题字段定义了说话人的声音。
voice-parameter = voice-gender / voice-age / voice-variant / voice-name
voice-parameter = voice-gender / voice-age / voice-variant / voice-name
voice-gender = "Voice-Gender:" voice-gender-value CRLF voice-gender-value = "male" / "female" / "neutral" voice-age = "Voice-Age:" 1*3DIGIT CRLF voice-variant = "Voice-Variant:" 1*19DIGIT CRLF voice-name = "Voice-Name:" 1*UTFCHAR *(1*WSP 1*UTFCHAR) CRLF
voice-gender = "Voice-Gender:" voice-gender-value CRLF voice-gender-value = "male" / "female" / "neutral" voice-age = "Voice-Age:" 1*3DIGIT CRLF voice-variant = "Voice-Variant:" 1*19DIGIT CRLF voice-name = "Voice-Name:" 1*UTFCHAR *(1*WSP 1*UTFCHAR) CRLF
The "Voice-" parameters are derived from the similarly named attributes of the voice element specified in W3C's Speech Synthesis Markup Language Specification (SSML) [W3C.REC-speech-synthesis-20040907]. Legal values for these parameters are as defined in that specification.
“Voice-”参数源自W3C的语音合成标记语言规范(SSML)[W3C.REC-Speech-Synthesis-20040907]中指定的语音元素的类似命名属性。这些参数的法定值如该规范中所定义。
These header fields MAY be sent in SET-PARAMS or GET-PARAMS requests to define or get default values for the entire session or MAY be sent in the SPEAK request to define default values for that SPEAK request. Note that SSML content can itself set these values internal to the SSML document, of course.
这些头字段可以在SET-PARAMS或GET-PARAMS请求中发送,以定义或获取整个会话的默认值,也可以在SPEAK请求中发送,以定义该SPEAK请求的默认值。请注意,SSML内容本身可以在SSML文档内部设置这些值。
Voice parameter header fields MAY also be sent in a CONTROL method to affect a SPEAK request in progress and change its behavior on the fly. If the synthesizer resource does not support this operation, it MUST reject the request with a status-code of 403 "Unsupported Header Field".
语音参数头字段也可以在控制方法中发送,以影响正在进行的语音请求并动态更改其行为。如果合成器资源不支持此操作,则必须拒绝状态代码为403“Unsupported Header Field”的请求。
This set of header fields defines the prosody of the speech.
这组标题字段定义了语音的韵律。
prosody-parameter = "Prosody-" prosody-param-name ":" prosody-param-value CRLF
韵律参数=“韵律—“韵律参数名称”:“韵律参数值CRLF”
prosody-param-name = 1*VCHAR
prosody-param-name = 1*VCHAR
prosody-param-value = 1*VCHAR
prosody-param-value = 1*VCHAR
prosody-param-name is any one of the attribute names under the prosody element specified in W3C's Speech Synthesis Markup Language Specification [W3C.REC-speech-synthesis-20040907]. The prosody-param-value is any one of the value choices of the corresponding prosody element attribute from that specification.
韵律参数名称是W3C语音合成标记语言规范[W3C.REC-Speech-Synthesis-20040907]中指定的韵律元素下的任何一个属性名称。韵律参数值是该规范中相应韵律元素属性的任意一个值选项。
These header fields MAY be sent in SET-PARAMS or GET-PARAMS requests to define or get default values for the entire session or MAY be sent in the SPEAK request to define default values for that SPEAK request. Furthermore, these attributes can be part of the speech text marked up in SSML.
这些头字段可以在SET-PARAMS或GET-PARAMS请求中发送,以定义或获取整个会话的默认值,也可以在SPEAK请求中发送,以定义该SPEAK请求的默认值。此外,这些属性可以是SSML中标记的语音文本的一部分。
The prosody parameter header fields in the SET-PARAMS or SPEAK request only apply if the speech data is of type 'text/plain' and does not use a speech markup format.
SET-PARAMS或SPEAK请求中的韵律参数头字段仅在语音数据类型为“text/plain”且未使用语音标记格式时适用。
These prosody parameter header fields MAY also be sent in a CONTROL method to affect a SPEAK request in progress and change its behavior on the fly. If the synthesizer resource does not support this operation, it MUST respond back to the client with a status-code of 403 "Unsupported Header Field".
这些韵律参数头字段也可以在控制方法中发送,以影响正在进行的语音请求并动态更改其行为。如果合成器资源不支持此操作,则它必须以403“Unsupported Header Field”(不支持的标头字段)的状态代码响应客户端。
This header field contains timestamp information in a "timestamp" field. This is a Network Time Protocol (NTP) [RFC5905] timestamp, a 64-bit number in decimal form. It MUST be synced with the Real-Time Protocol (RTP) [RFC3550] timestamp of the media stream through the Real-Time Control Protocol (RTCP) [RFC3550].
此标头字段在“时间戳”字段中包含时间戳信息。这是一个网络时间协议(NTP)[RFC5905]时间戳,十进制形式的64位数字。它必须通过实时控制协议(RTCP)[RFC3550]与媒体流的实时协议(RTP)[RFC3550]时间戳同步。
Markers are bookmarks that are defined within the markup. Most speech markup formats provide mechanisms to embed marker fields within speech texts. The synthesizer generates SPEECH-MARKER events when it reaches these marker fields. This header field MUST be part of the SPEECH-MARKER event and contain the marker tag value after the timestamp, separated by a semicolon. In these events, the timestamp marks the time the text corresponding to the marker was emitted as speech by the synthesizer.
标记是在标记中定义的书签。大多数语音标记格式提供了在语音文本中嵌入标记字段的机制。合成器到达这些标记字段时生成语音标记事件。此标头字段必须是SPEECH-MARKER事件的一部分,并包含时间戳后的标记标记值,以分号分隔。在这些事件中,时间戳标记与标记相对应的文本被合成器作为语音发出的时间。
This header field MUST also be returned in responses to STOP, CONTROL, and BARGE-IN-OCCURRED methods, in the SPEAK-COMPLETE event, and in an IN-PROGRESS SPEAK response. In these messages, if any markers have been encountered for the current SPEAK, the marker tag value MUST be the last embedded marker encountered. If no markers have yet been encountered for the current SPEAK, only the timestamp is REQUIRED. Note that in these events, the purpose of this header field is to provide timestamp information associated with important events within the lifecycle of a request (start of SPEAK processing, end of SPEAK processing, receipt of CONTROL/STOP/BARGE-IN-OCCURRED).
在SPEAK-COMPLETE事件和进行中SPEAK响应中,还必须在对停止、控制和插入方法的响应中返回此标题字段。在这些消息中,如果当前讲话遇到任何标记,则标记标记值必须是遇到的最后一个嵌入标记。如果当前讲话尚未遇到任何标记,则只需要时间戳。请注意,在这些事件中,此标头字段的目的是提供与请求生命周期内的重要事件相关联的时间戳信息(开始发言处理、结束发言处理、接收控制/停止/驳船进入)。
timestamp = "timestamp" "=" time-stamp-value
timestamp=“timestamp”“=”时间戳值
time-stamp-value = 1*20DIGIT
time-stamp-value = 1*20DIGIT
speech-marker = "Speech-Marker" ":" timestamp [";" 1*(UTFCHAR / %x20)] CRLF
speech-marker = "Speech-Marker" ":" timestamp [";" 1*(UTFCHAR / %x20)] CRLF
This header field specifies the default language of the speech data if the language is not specified in the markup. The value of this header field MUST follow RFC 5646 [RFC5646] for its values. The header field MAY occur in SPEAK, SET-PARAMS, or GET-PARAMS requests.
如果标记中未指定语言,则此标题字段指定语音数据的默认语言。此标头字段的值必须在RFC 5646[RFC5646]之后。标头字段可能出现在SPEAK、SET-PARAMS或GET-PARAMS请求中。
speech-language = "Speech-Language" ":" 1*VCHAR CRLF
speech-language = "Speech-Language" ":" 1*VCHAR CRLF
When the synthesizer needs to fetch documents or other resources like speech markup or audio files, this header field controls the corresponding URI access properties. This provides client policy on when the synthesizer should retrieve content from the server. A value of "prefetch" indicates the content MAY be downloaded when the request is received, whereas "safe" indicates that content MUST NOT
当合成器需要获取文档或其他资源(如语音标记或音频文件)时,此头字段控制相应的URI访问属性。这为合成器何时应从服务器检索内容提供了客户端策略。值“prefetch”表示在收到请求时可以下载内容,而“safe”表示不能下载内容
be downloaded until actually referenced. The default value is "prefetch". This header field MAY occur in SPEAK, SET-PARAMS, or GET-PARAMS requests.
下载,直到实际引用。默认值为“预取”。此标头字段可能出现在SPEAK、SET-PARAMS或GET-PARAMS请求中。
fetch-hint = "Fetch-Hint" ":" ("prefetch" / "safe") CRLF
fetch-hint = "Fetch-Hint" ":" ("prefetch" / "safe") CRLF
When the synthesizer needs to fetch documents or other resources like speech audio files, this header field controls the corresponding URI access properties. This provides client policy whether or not the synthesizer is permitted to attempt to optimize speech by pre-fetching audio. The value is either "safe" to say that audio is only fetched when it is referenced, never before; "prefetch" to permit, but not require the implementation to pre-fetch the audio; or "stream" to allow it to stream the audio fetches. The default value is "prefetch". This header field MAY occur in SPEAK, SET-PARAMS, or GET-PARAMS requests.
当合成器需要获取文档或其他资源(如语音音频文件)时,此头字段控制相应的URI访问属性。这提供了客户端策略,无论是否允许合成器尝试通过预取音频来优化语音。该值要么是“安全”的,即仅在引用音频时才获取音频,而以前从未获取过;“预取”允许但不要求实现预取音频;或“stream”以允许它对音频回迁进行流式处理。默认值为“预取”。此标头字段可能出现在SPEAK、SET-PARAMS或GET-PARAMS请求中。
audio-fetch-hint = "Audio-Fetch-Hint" ":" ("prefetch" / "safe" / "stream") CRLF
audio-fetch-hint = "Audio-Fetch-Hint" ":" ("prefetch" / "safe" / "stream") CRLF
When a synthesizer method needs a synthesizer to fetch or access a URI and the access fails, the server SHOULD provide the failed URI in this header field in the method response, unless there are multiple URI failures, in which case the server MUST provide one of the failed URIs in this header field in the method response.
当合成器方法需要合成器来获取或访问URI且访问失败时,服务器应在方法响应中的此标头字段中提供失败的URI,除非存在多个URI失败,在这种情况下,服务器必须在方法响应中的此标头字段中提供一个失败的URI。
failed-uri = "Failed-URI" ":" absoluteURI CRLF
failed uri=“failed uri”“:”绝对uri CRLF
When a synthesizer method needs a synthesizer to fetch or access a URI and the access fails, the server MUST provide the URI-specific or protocol-specific response code for the URI in the Failed-URI header field in the method response through this header field. The value encoding is UTF-8 (RFC 3629 [RFC3629]) to accommodate any access protocol -- some access protocols might have a response string instead of a numeric response code.
当合成器方法需要合成器来获取或访问URI且访问失败时,服务器必须通过此标头字段在方法响应的Failed URI标头字段中为URI提供特定于URI或特定于协议的响应代码。值编码为UTF-8(RFC 3629[RFC3629]),以适应任何访问协议——某些访问协议可能具有响应字符串而不是数字响应代码。
failed-uri-cause = "Failed-URI-Cause" ":" 1*UTFCHAR CRLF
failed-uri-cause = "Failed-URI-Cause" ":" 1*UTFCHAR CRLF
When a client issues a CONTROL request to a currently speaking synthesizer resource to jump backward, and the target jump point is before the start of the current SPEAK request, the current SPEAK request MUST restart from the beginning of its speech data and the server's response to the CONTROL request MUST contain this header field with a value of "true" indicating a restart.
当客户端向当前语音合成器资源发出控制请求以向后跳转时,且目标跳转点在当前语音请求开始之前,当前SPEAK请求必须从其语音数据的开头重新启动,服务器对控制请求的响应必须包含此标头字段,其值为“true”,表示重新启动。
speak-restart = "Speak-Restart" ":" BOOLEAN CRLF
speak restart=“speak restart”“:”布尔CRLF
This header field MAY be specified in a CONTROL method to control the maximum length of speech to speak, relative to the current speaking point in the currently active SPEAK request. If numeric, the value MUST be a positive integer. If a header field with a Tag unit is specified, then the speech output continues until the tag is reached or the SPEAK request is completed, whichever comes first. This header field MAY be specified in a SPEAK request to indicate the length to speak from the speech data and is relative to the point in speech that the SPEAK request starts. The different speech length units supported are synthesizer implementation dependent. If a server does not support the specified unit, the server MUST respond with a status-code of 409 "Unsupported Header Field Value".
该报头字段可以在控制方法中指定,以控制相对于当前活动讲话请求中的当前讲话点的讲话的最大长度。如果为数字,则该值必须为正整数。如果指定了带有标记单元的标题字段,则语音输出将继续,直到到达标记或完成语音请求为止,以先到者为准。该报头字段可以在讲话请求中指定,以指示从语音数据开始讲话的长度,并且相对于讲话请求开始的语音点。支持的不同语音长度单位取决于合成器的实现。如果服务器不支持指定的单元,则服务器必须以409状态代码“Unsupported Header Field Value”进行响应。
speak-length = "Speak-Length" ":" positive-length-value CRLF
speak length=“speak length”“:“正长度值CRLF
positive-length-value = positive-speech-length / text-speech-length
正长度值=正语音长度/文本语音长度
text-speech-length = 1*UTFCHAR SP "Tag"
text-speech-length = 1*UTFCHAR SP "Tag"
positive-speech-length = 1*19DIGIT SP numeric-speech-unit
positive-speech-length = 1*19DIGIT SP numeric-speech-unit
numeric-speech-unit = "Second" / "Word" / "Sentence" / "Paragraph"
numeric-speech-unit = "Second" / "Word" / "Sentence" / "Paragraph"
This header field is used to indicate whether a lexicon has to be loaded or unloaded. The value "true" means to load the lexicon if not already loaded, and the value "false" means to unload the lexicon if it is loaded. The default value for this header field is "true". This header field MAY be specified in a DEFINE-LEXICON method.
此标题字段用于指示是否必须加载或卸载词典。值“true”表示加载词典(如果尚未加载),值“false”表示卸载词典(如果已加载)。此标题字段的默认值为“true”。此标题字段可以在DEFINE-LEXICON方法中指定。
load-lexicon = "Load-Lexicon" ":" BOOLEAN CRLF
load lexicon=“load lexicon”“:”布尔CRLF
This header field is used to specify a list of active pronunciation lexicon URIs and the search order among the active lexicons. Lexicons specified within the SSML document take precedence over the lexicons specified in this header field. This header field MAY be specified in the SPEAK, SET-PARAMS, and GET-PARAMS methods.
此标题字段用于指定活动发音词典URI的列表以及活动词典之间的搜索顺序。SSML文档中指定的词典优先于此标题字段中指定的词典。此标头字段可以在SPEAK、SET-PARAMS和GET-PARAMS方法中指定。
lexicon-search-order = "Lexicon-Search-Order" ":" "<" absoluteURI ">" *(" " "<" absoluteURI ">") CRLF
lexicon-search-order = "Lexicon-Search-Order" ":" "<" absoluteURI ">" *(" " "<" absoluteURI ">") CRLF
A synthesizer message can contain additional information associated with the Request, Response, or Event in its message body.
合成器消息可以在其消息体中包含与请求、响应或事件相关的附加信息。
Marked-up text for the synthesizer to speak is specified as a typed media entity in the message body. The speech data to be spoken by the synthesizer can be specified inline by embedding the data in the message body or by reference by providing a URI for accessing the data. In either case, the data and the format used to markup the speech needs to be of a content type supported by the server.
合成器要讲话的标记文本在消息体中指定为类型化媒体实体。可以通过将数据嵌入消息体或通过提供用于访问数据的URI的引用来内联地指定合成器要说的语音数据。无论哪种情况,用于标记语音的数据和格式都需要是服务器支持的内容类型。
All MRCPv2 servers containing synthesizer resources MUST support both plain text speech data and W3C's Speech Synthesis Markup Language [W3C.REC-speech-synthesis-20040907] and hence MUST support the media types 'text/plain' and 'application/ssml+xml'. Other formats MAY be supported.
包含合成器资源的所有MRCPv2服务器必须同时支持纯文本语音数据和W3C的语音合成标记语言[W3C.REC-speech-Synthesis-20040907],因此必须支持媒体类型“text/plain”和“application/ssml+xml”。可能支持其他格式。
If the speech data is to be fetched by URI reference, the media type 'text/uri-list' (see RFC 2483 [RFC2483]) is used to indicate one or more URIs that, when dereferenced, will contain the content to be spoken. If a list of speech URIs is specified, the resource MUST speak the speech data provided by each URI in the order in which the URIs are specified in the content.
如果要通过URI引用获取语音数据,则使用媒体类型“text/URI list”(参见RFC 2483[RFC2483])来指示一个或多个URI,当取消引用时,这些URI将包含要讲话的内容。如果指定了语音URI列表,则资源必须按照在内容中指定URI的顺序说出每个URI提供的语音数据。
MRCPv2 clients and servers MUST support the 'multipart/mixed' media type. This is the appropriate media type to use when providing a mix of URI and inline speech data. Embedded within the multipart content block, there MAY be content for the 'text/uri-list', 'application/ ssml+xml', and/or 'text/plain' media types. The character set and encoding used in the speech data is specified according to standard media type definitions. The multipart content MAY also contain actual audio data. Clients may have recorded audio clips stored in memory or on a local device and wish to play it as part of the SPEAK request. The audio portions MAY be sent by the client as part of the multipart content block. This audio is referenced in the speech markup data that is another part in the multipart content block according to the 'multipart/mixed' media type specification.
MRCPv2客户端和服务器必须支持“多部分/混合”介质类型。这是在混合提供URI和内联语音数据时要使用的适当媒体类型。嵌入在多部分内容块中,可能有“文本/uri列表”、“应用程序/ssml+xml”和/或“文本/普通”媒体类型的内容。语音数据中使用的字符集和编码是根据标准媒体类型定义指定的。多部分内容还可能包含实际的音频数据。客户端可能已经录制了存储在内存中或本地设备上的音频片段,并希望将其作为语音请求的一部分播放。音频部分可以由客户端作为多部分内容块的一部分发送。根据“多部分/混合”媒体类型规范,此音频在语音标记数据中引用,语音标记数据是多部分内容块中的另一部分。
Content-Type:text/uri-list Content-Length:...
内容类型:文本/uri列表内容长度:。。。
http://www.example.com/ASR-Introduction.ssml http://www.example.com/ASR-Document-Part1.ssml http://www.example.com/ASR-Document-Part2.ssml http://www.example.com/ASR-Conclusion.ssml
http://www.example.com/ASR-Introduction.ssml http://www.example.com/ASR-Document-Part1.ssml http://www.example.com/ASR-Document-Part2.ssml http://www.example.com/ASR-Conclusion.ssml
URI List Example
URI列表示例
Content-Type:application/ssml+xml Content-Length:...
内容类型:应用程序/ssml+xml内容长度:。。。
<?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Aldine Turnbet and arrived at <break/> <say-as interpret-as="vxml:time">0345p</say-as>.</s>
<?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Aldine Turnbet and arrived at <break/> <say-as interpret-as="vxml:time">0345p</say-as>.</s>
<s>The subject is <prosody rate="-20%">ski trip</prosody></s> </p> </speak>
<s>The subject is <prosody rate="-20%">ski trip</prosody></s> </p> </speak>
SSML Example
SSML示例
Content-Type:multipart/mixed; boundary="break"
Content-Type:multipart/mixed; boundary="break"
--break Content-Type:text/uri-list Content-Length:...
--中断内容类型:文本/uri列表内容长度:。。。
http://www.example.com/ASR-Introduction.ssml http://www.example.com/ASR-Document-Part1.ssml http://www.example.com/ASR-Document-Part2.ssml http://www.example.com/ASR-Conclusion.ssml
http://www.example.com/ASR-Introduction.ssml http://www.example.com/ASR-Document-Part1.ssml http://www.example.com/ASR-Document-Part2.ssml http://www.example.com/ASR-Conclusion.ssml
--break Content-Type:application/ssml+xml Content-Length:...
--中断内容类型:应用程序/ssml+xml内容长度:。。。
<?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Stephanie Williams and arrived at <break/> <say-as interpret-as="vxml:time">0342p</say-as>.</s>
<?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Stephanie Williams and arrived at <break/> <say-as interpret-as="vxml:time">0342p</say-as>.</s>
<s>The subject is <prosody rate="-20%">ski trip</prosody></s> </p> </speak> --break--
<s>The subject is <prosody rate="-20%">ski trip</prosody></s> </p> </speak> --break--
Multipart Example
多部分示例
Synthesizer lexicon data from the client to the server can be provided inline or by reference. Either way, they are carried as typed media in the message body of the MRCPv2 request message (see Section 8.14).
从客户端到服务器的合成器词典数据可以内联提供,也可以通过引用提供。无论哪种方式,它们在MRCPv2请求消息的消息体中都作为键入的媒体携带(参见第8.14节)。
When a lexicon is specified inline in the message, the client MUST provide a Content-ID for that lexicon as part of the content header fields. The server MUST store the lexicon associated with that Content-ID for the duration of the session. A stored lexicon can be overwritten by defining a new lexicon with the same Content-ID.
当在消息中内联指定词典时,客户端必须为该词典提供一个内容ID,作为内容头字段的一部分。服务器必须在会话期间存储与该内容ID关联的词典。通过定义具有相同内容ID的新词典,可以覆盖存储的词典。
Lexicons that have been associated with a Content-ID can be referenced through the 'session' URI scheme (see Section 13.6).
与内容ID关联的词典可以通过“会话”URI方案引用(参见第13.6节)。
If lexicon data is specified by external URI reference, the media type 'text/uri-list' (see RFC 2483 [RFC2483] ) is used to list the one or more URIs that may be dereferenced to obtain the lexicon data. All MRCPv2 servers MUST support the "http" and "https" URI access mechanisms, and MAY support other mechanisms.
如果词典数据由外部URI引用指定,则使用媒体类型“text/URI list”(参见RFC 2483[RFC2483])列出一个或多个URI,这些URI可以被取消引用以获取词典数据。所有MRCPv2服务器必须支持“http”和“https”URI访问机制,并且可能支持其他机制。
If the data in the message body consists of a mix of URI and inline lexicon data, the 'multipart/mixed' media type is used. The character set and encoding used in the lexicon data may be specified according to standard media type definitions.
如果消息体中的数据由URI和内联词典数据的混合组成,则使用“多部分/混合”媒体类型。词典数据中使用的字符集和编码可以根据标准媒体类型定义来指定。
The SPEAK request provides the synthesizer resource with the speech text and initiates speech synthesis and streaming. The SPEAK method MAY carry voice and prosody header fields that alter the behavior of the voice being synthesized, as well as a typed media message body containing the actual marked-up text to be spoken.
SPEAK请求向合成器资源提供语音文本,并启动语音合成和流式传输。SPEAK方法可以携带改变正在合成的语音行为的语音和韵律标题字段,以及包含要讲话的实际标记文本的类型化媒体消息正文。
The SPEAK method implementation MUST do a fetch of all external URIs that are part of that operation. If caching is implemented, this URI fetching MUST conform to the cache-control hints and parameter header fields associated with the method in deciding whether it is to be fetched from cache or from the external server. If these hints/ parameters are not specified in the method, the values set for the session using SET-PARAMS/GET-PARAMS apply. If it was not set for the session, their default values apply.
SPEAK方法实现必须获取作为该操作一部分的所有外部URI。如果实现了缓存,则此URI获取必须符合与方法关联的缓存控制提示和参数头字段,以确定是从缓存还是从外部服务器获取。如果在方法中未指定这些提示/参数,则将应用使用set-PARAMS/GET-PARAMS为会话设置的值。如果未为会话设置,则应用其默认值。
When applying voice parameters, there are three levels of precedence. The highest precedence are those specified within the speech markup text, followed by those specified in the header fields of the SPEAK request and hence that apply for that SPEAK request only, followed by the session default values that can be set using the SET-PARAMS request and apply for subsequent methods invoked during the session.
应用语音参数时,有三个优先级级别。最高优先级是在语音标记文本中指定的优先级,其次是在SPEAK请求的标头字段中指定的优先级,因此仅适用于该SPEAK请求,然后是可以使用set-PARAMS请求设置并适用于会话期间调用的后续方法的会话默认值。
If the resource was idle at the time the SPEAK request arrived at the server and the SPEAK method is being actively processed, the resource responds immediately with a success status code and a request-state of IN-PROGRESS.
如果资源在SPEAK请求到达服务器时处于空闲状态,并且SPEAK方法正在积极处理中,则资源会立即响应,并显示成功状态代码和正在进行的请求状态。
If the resource is in the speaking or paused state when the SPEAK method arrives at the server, i.e., it is in the middle of processing a previous SPEAK request, the status returns success with a request-state of PENDING. The server places the SPEAK request in the synthesizer resource request queue. The request queue operates
如果资源在说话人或暂停状态下,当通话方法到达服务器时,即处于处理以前的发言请求的中间时,状态以未决请求状态返回成功。服务器将SPEAK请求放入合成器资源请求队列中。请求队列运行
strictly FIFO: requests are processed serially in order of receipt. If the current SPEAK fails, all SPEAK methods in the pending queue are cancelled and each generates a SPEAK-COMPLETE event with a Completion-Cause of "cancelled".
严格的先进先出:请求按接收顺序连续处理。如果当前SPEAK失败,将取消挂起队列中的所有SPEAK方法,并且每个方法都生成一个SPEAK-COMPLETE事件,完成原因为“cancelled”。
For the synthesizer resource, SPEAK is the only method that can return a request-state of IN-PROGRESS or PENDING. When the text has been synthesized and played into the media stream, the resource issues a SPEAK-COMPLETE event with the request-id of the SPEAK request and a request-state of COMPLETE.
对于合成器资源,SPEAK是唯一可以返回正在进行或挂起的请求状态的方法。当文本被合成并播放到媒体流中时,资源发出SPEAK-COMPLETE事件,该事件具有SPEAK请求的请求id和请求状态COMPLETE。
C->S: MRCP/2.0 ... SPEAK 543257 Channel-Identifier:32AECB23433802@speechsynth Voice-gender:neutral Voice-Age:25 Prosody-volume:medium Content-Type:application/ssml+xml Content-Length:...
C->S:MRCP/2.0。。。语音543257频道标识符:32AECB23433802@speechsynth声音性别:中性声音年龄:25韵律音量:中等内容类型:应用程序/ssml+xml内容长度:。。。
<?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Stephanie Williams and arrived at <break/> <say-as interpret-as="vxml:time">0342p</say-as>. </s> <s>The subject is <prosody rate="-20%">ski trip</prosody> </s> </p> </speak>
<?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Stephanie Williams and arrived at <break/> <say-as interpret-as="vxml:time">0342p</say-as>. </s> <s>The subject is <prosody rate="-20%">ski trip</prosody> </s> </p> </speak>
S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS Channel-Identifier:32AECB23433802@speechsynth Speech-Marker:timestamp=857206027059
S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS Channel-Identifier:32AECB23433802@speechsynth Speech-Marker:timestamp=857206027059
S->C: MRCP/2.0 ... SPEAK-COMPLETE 543257 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Completion-Cause:000 normal Speech-Marker:timestamp=857206027059
S->C: MRCP/2.0 ... SPEAK-COMPLETE 543257 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Completion-Cause:000 normal Speech-Marker:timestamp=857206027059
SPEAK Example
举例说明
The STOP method from the client to the server tells the synthesizer resource to stop speaking if it is speaking something.
从客户端到服务器的STOP方法告诉合成器资源如果正在讲话,就停止讲话。
The STOP request can be sent with an Active-Request-Id-List header field to stop the zero or more specific SPEAK requests that may be in queue and return a response status-code of 200 "Success". If no Active-Request-Id-List header field is sent in the STOP request, the server terminates all outstanding SPEAK requests.
可以使用活动请求Id列表头字段发送停止请求,以停止队列中可能存在的零个或多个特定语音请求,并返回200“成功”的响应状态代码。如果停止请求中未发送任何活动请求Id列表头字段,服务器将终止所有未完成的语音请求。
If a STOP request successfully terminated one or more PENDING or IN-PROGRESS SPEAK requests, then the response MUST contain an Active-Request-Id-List header field enumerating the SPEAK request-ids that were terminated. Otherwise, there is no Active-Request-Id-List header field in the response. No SPEAK-COMPLETE events are sent for such terminated requests.
如果停止请求成功终止了一个或多个挂起或正在进行的语音请求,则响应必须包含活动请求Id列表头字段,该字段枚举已终止的语音请求Id。否则,响应中没有活动的请求Id列表头字段。对于此类终止的请求,不会发送SPEAK-COMPLETE事件。
If a SPEAK request that was IN-PROGRESS and speaking was stopped, the next pending SPEAK request, if any, becomes IN-PROGRESS at the resource and enters the speaking state.
如果正在进行且停止了讲话的讲话请求,则下一个挂起的讲话请求(如果有)将在资源处变为正在进行并进入讲话状态。
If a SPEAK request that was IN-PROGRESS and paused was stopped, the next pending SPEAK request, if any, becomes IN-PROGRESS and enters the paused state.
如果正在进行且暂停的讲话请求已停止,则下一个挂起的讲话请求(如果有)将变为正在进行并进入暂停状态。
C->S: MRCP/2.0 ... SPEAK 543258 Channel-Identifier:32AECB23433802@speechsynth Content-Type:application/ssml+xml Content-Length:...
C->S:MRCP/2.0。。。语音543258信道标识符:32AECB23433802@speechsynth内容类型:应用程序/ssml+xml内容长度:。。。
<?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Stephanie Williams and arrived at <break/> <say-as interpret-as="vxml:time">0342p</say-as>.</s> <s>The subject is <prosody rate="-20%">ski trip</prosody></s> </p> </speak>
<?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Stephanie Williams and arrived at <break/> <say-as interpret-as="vxml:time">0342p</say-as>.</s> <s>The subject is <prosody rate="-20%">ski trip</prosody></s> </p> </speak>
S->C: MRCP/2.0 ... 543258 200 IN-PROGRESS Channel-Identifier:32AECB23433802@speechsynth Speech-Marker:timestamp=857206027059
S->C: MRCP/2.0 ... 543258 200 IN-PROGRESS Channel-Identifier:32AECB23433802@speechsynth Speech-Marker:timestamp=857206027059
C->S: MRCP/2.0 ... STOP 543259 Channel-Identifier:32AECB23433802@speechsynth
C->S: MRCP/2.0 ... STOP 543259 Channel-Identifier:32AECB23433802@speechsynth
S->C: MRCP/2.0 ... 543259 200 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Active-Request-Id-List:543258 Speech-Marker:timestamp=857206039059
S->C: MRCP/2.0 ... 543259 200 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Active-Request-Id-List:543258 Speech-Marker:timestamp=857206039059
STOP Example
停止举例
The BARGE-IN-OCCURRED method, when used with the synthesizer resource, provides a client that has detected a barge-in-able event a means to communicate the occurrence of the event to the synthesizer resource.
当与合成器资源一起使用时,插入发生的方法为已检测到插入事件的客户端提供了将事件发生与合成器资源进行通信的方法。
This method is useful in two scenarios:
此方法在两种情况下非常有用:
1. The client has detected DTMF digits in the input media or some other barge-in-able event and wants to communicate that to the synthesizer resource.
1. 客户端在输入媒体或某些其他可插入事件中检测到DTMF数字,并希望将其与合成器资源进行通信。
2. The recognizer resource and the synthesizer resource are in different servers. In this case, the client acts as an intermediary for the two servers. It receives an event from the recognition resource and sends a BARGE-IN-OCCURRED request to the synthesizer. In such cases, the BARGE-IN-OCCURRED method would also have a Proxy-Sync-Id header field received from the resource generating the original event.
2. 识别器资源和合成器资源位于不同的服务器中。在这种情况下,客户端充当两台服务器的中介。它从识别资源接收一个事件,并向合成器发送一个插入请求。在这种情况下,BARGE-In-occurrent方法还将具有从生成原始事件的资源接收的代理同步Id头字段。
If a SPEAK request is active with kill-on-barge-in enabled (see Section 8.4.2), and the BARGE-IN-OCCURRED event is received, the synthesizer MUST immediately stop streaming out audio. It MUST also terminate any speech requests queued behind the current active one, irrespective of whether or not they have barge-in enabled. If a barge-in-able SPEAK request was playing and it was terminated, the response MUST contain an Active-Request-Id-List header field listing the request-ids of all SPEAK requests that were terminated. The server generates no SPEAK-COMPLETE events for these requests.
如果语音请求在启用驳船压井的情况下激活(参见第8.4.2节),并且收到驳船压井事件,则合成器必须立即停止输出音频。它还必须终止在当前活动请求之后排队的任何语音请求,无论它们是否已启用驳接。如果正在播放一个驳入式语音请求并且该请求已终止,则响应必须包含一个活动请求Id列表标题字段,其中列出了所有已终止语音请求的请求Id。服务器不会为这些请求生成SPEAK-COMPLETE事件。
If there were no SPEAK requests terminated by the synthesizer resource as a result of the BARGE-IN-OCCURRED method, the server MUST respond to the BARGE-IN-OCCURRED with a status-code of 200 "Success", and the response MUST NOT contain an Active-Request-Id-List header field.
如果合成器资源没有由于“插入发生”方法而终止的语音请求,服务器必须以200“成功”状态代码响应“插入发生”,并且响应不得包含活动请求Id列表标题字段。
If the synthesizer and recognizer resources are part of the same MRCPv2 session, they can be optimized for a quicker kill-on-barge-in response if the recognizer and synthesizer interact directly. In these cases, the client MUST still react to a START-OF-INPUT event from the recognizer by invoking the BARGE-IN-OCCURRED method to the synthesizer. The client MUST invoke the BARGE-IN-OCCURRED if it has any outstanding requests to the synthesizer resource in either the PENDING or IN-PROGRESS state.
如果合成器和识别器资源是同一MRCPv2会话的一部分,则如果识别器和合成器直接交互,则可以对它们进行优化,以便更快地在驳船上进行杀伤。在这些情况下,客户机仍然必须通过向合成器调用BARGE-In-executed方法来响应来自识别器的输入启动事件。如果客户端对处于挂起或进行中状态的合成器资源有任何未完成的请求,则它必须调用BARGE-IN-occurrent。
C->S: MRCP/2.0 ... SPEAK 543258 Channel-Identifier:32AECB23433802@speechsynth Voice-gender:neutral Voice-Age:25 Prosody-volume:medium Content-Type:application/ssml+xml Content-Length:...
C->S:MRCP/2.0。。。语音543258信道标识符:32AECB23433802@speechsynth声音性别:中性声音年龄:25韵律音量:中等内容类型:应用程序/ssml+xml内容长度:。。。
<?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Stephanie Williams and arrived at <break/> <say-as interpret-as="vxml:time">0342p</say-as>.</s> <s>The subject is <prosody rate="-20%">ski trip</prosody></s> </p> </speak>
<?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Stephanie Williams and arrived at <break/> <say-as interpret-as="vxml:time">0342p</say-as>.</s> <s>The subject is <prosody rate="-20%">ski trip</prosody></s> </p> </speak>
S->C: MRCP/2.0 ... 543258 200 IN-PROGRESS Channel-Identifier:32AECB23433802@speechsynth Speech-Marker:timestamp=857206027059
S->C: MRCP/2.0 ... 543258 200 IN-PROGRESS Channel-Identifier:32AECB23433802@speechsynth Speech-Marker:timestamp=857206027059
C->S: MRCP/2.0 ... BARGE-IN-OCCURRED 543259 Channel-Identifier:32AECB23433802@speechsynth Proxy-Sync-Id:987654321
C->S: MRCP/2.0 ... BARGE-IN-OCCURRED 543259 Channel-Identifier:32AECB23433802@speechsynth Proxy-Sync-Id:987654321
S->C:MRCP/2.0 ... 543259 200 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Active-Request-Id-List:543258 Speech-Marker:timestamp=857206039059
S->C:MRCP/2.0 ... 543259 200 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Active-Request-Id-List:543258 Speech-Marker:timestamp=857206039059
BARGE-IN-OCCURRED Example
驳船进站示例
The PAUSE method from the client to the server tells the synthesizer resource to pause speech output if it is speaking something. If a PAUSE method is issued on a session when a SPEAK is not active, the server MUST respond with a status-code of 402 "Method not valid in this state". If a PAUSE method is issued on a session when a SPEAK is active and paused, the server MUST respond with a status-code of 200 "Success". If a SPEAK request was active, the server MUST return an Active-Request-Id-List header field whose value contains the request-id of the SPEAK request that was paused.
从客户端到服务器的PAUSE方法告诉合成器资源暂停语音输出(如果它正在讲话)。如果在通话未激活时在会话上发出暂停方法,则服务器必须以402状态代码“method not valid in this state”(方法在此状态下无效)进行响应。如果在通话处于活动状态且暂停时在会话上发出暂停方法,则服务器必须以200“成功”的状态代码进行响应。如果SPEAK请求处于活动状态,服务器必须返回活动请求Id列表标题字段,该字段的值包含暂停的SPEAK请求的请求Id。
C->S: MRCP/2.0 ... SPEAK 543258 Channel-Identifier:32AECB23433802@speechsynth Voice-gender:neutral Voice-Age:25 Prosody-volume:medium Content-Type:application/ssml+xml Content-Length:...
C->S:MRCP/2.0。。。语音543258信道标识符:32AECB23433802@speechsynth声音性别:中性声音年龄:25韵律音量:中等内容类型:应用程序/ssml+xml内容长度:。。。
<?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Stephanie Williams and arrived at <break/> <say-as interpret-as="vxml:time">0342p</say-as>.</s>
<?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Stephanie Williams and arrived at <break/> <say-as interpret-as="vxml:time">0342p</say-as>.</s>
<s>The subject is <prosody rate="-20%">ski trip</prosody></s> </p> </speak>
<s>The subject is <prosody rate="-20%">ski trip</prosody></s> </p> </speak>
S->C: MRCP/2.0 ... 543258 200 IN-PROGRESS Channel-Identifier:32AECB23433802@speechsynth Speech-Marker:timestamp=857206027059
S->C: MRCP/2.0 ... 543258 200 IN-PROGRESS Channel-Identifier:32AECB23433802@speechsynth Speech-Marker:timestamp=857206027059
C->S: MRCP/2.0 ... PAUSE 543259 Channel-Identifier:32AECB23433802@speechsynth
C->S: MRCP/2.0 ... PAUSE 543259 Channel-Identifier:32AECB23433802@speechsynth
S->C: MRCP/2.0 ... 543259 200 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Active-Request-Id-List:543258
S->C: MRCP/2.0 ... 543259 200 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Active-Request-Id-List:543258
PAUSE Example
暂停示例
The RESUME method from the client to the server tells a paused synthesizer resource to resume speaking. If a RESUME request is issued on a session with no active SPEAK request, the server MUST respond with a status-code of 402 "Method not valid in this state". If a RESUME request is issued on a session with an active SPEAK request that is speaking (i.e., not paused), the server MUST respond with a status-code of 200 "Success". If a SPEAK request was paused, the server MUST return an Active-Request-Id-List header field whose value contains the request-id of the SPEAK request that was resumed.
从客户端到服务器的RESUME方法告诉暂停的合成器资源恢复讲话。如果在没有活动讲话请求的会话上发出恢复请求,则服务器必须以402状态代码“Method not valid in this state”(方法在此状态下无效)进行响应。如果在会话中发出恢复请求,且会话中有正在讲话(即未暂停)的活动讲话请求,则服务器必须以200“成功”的状态代码进行响应。如果暂停了发言请求,服务器必须返回一个活动请求Id列表头字段,该字段的值包含已恢复发言请求的请求Id。
C->S: MRCP/2.0 ... SPEAK 543258 Channel-Identifier:32AECB23433802@speechsynth Voice-gender:neutral Voice-age:25 Prosody-volume:medium Content-Type:application/ssml+xml Content-Length:...
C->S:MRCP/2.0。。。语音543258信道标识符:32AECB23433802@speechsynth声音性别:中性声音年龄:25韵律音量:中等内容类型:应用程序/ssml+xml内容长度:。。。
<?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Stephanie Williams and arrived at <break/> <say-as interpret-as="vxml:time">0342p</say-as>.</s> <s>The subject is <prosody rate="-20%">ski trip</prosody></s> </p> </speak>
<?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Stephanie Williams and arrived at <break/> <say-as interpret-as="vxml:time">0342p</say-as>.</s> <s>The subject is <prosody rate="-20%">ski trip</prosody></s> </p> </speak>
S->C: MRCP/2.0 ... 543258 200 IN-PROGRESS@speechsynth Channel-Identifier:32AECB23433802 Speech-Marker:timestamp=857206027059
S->C: MRCP/2.0 ... 543258 200 IN-PROGRESS@speechsynth Channel-Identifier:32AECB23433802 Speech-Marker:timestamp=857206027059
C->S: MRCP/2.0 ... PAUSE 543259 Channel-Identifier:32AECB23433802@speechsynth
C->S: MRCP/2.0 ... PAUSE 543259 Channel-Identifier:32AECB23433802@speechsynth
S->C: MRCP/2.0 ... 543259 200 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Active-Request-Id-List:543258
S->C: MRCP/2.0 ... 543259 200 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Active-Request-Id-List:543258
C->S: MRCP/2.0 ... RESUME 543260 Channel-Identifier:32AECB23433802@speechsynth
C->S: MRCP/2.0 ... RESUME 543260 Channel-Identifier:32AECB23433802@speechsynth
S->C: MRCP/2.0 ... 543260 200 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Active-Request-Id-List:543258
S->C: MRCP/2.0 ... 543260 200 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Active-Request-Id-List:543258
RESUME Example
简历示例
The CONTROL method from the client to the server tells a synthesizer that is speaking to modify what it is speaking on the fly. This method is used to request the synthesizer to jump forward or backward in what it is speaking, change speaker rate, speaker parameters, etc. It affects only the currently IN-PROGRESS SPEAK request. Depending on the implementation and capability of the synthesizer resource, it may or may not support the various modifications indicated by header fields in the CONTROL request.
从客户端到服务器的控制方法告诉正在讲话的合成器动态修改它正在讲话的内容。此方法用于请求合成器在讲话时向前或向后跳转,更改扬声器频率、扬声器参数等。它仅影响当前正在进行的讲话请求。取决于合成器资源的实现和能力,它可能支持也可能不支持控制请求中的报头字段所指示的各种修改。
When a client invokes a CONTROL method to jump forward and the operation goes beyond the end of the active SPEAK method's text, the CONTROL request still succeeds. The active SPEAK request completes and returns a SPEAK-COMPLETE event following the response to the CONTROL method. If there are more SPEAK requests in the queue, the synthesizer resource starts at the beginning of the next SPEAK request in the queue.
当客户端调用一个控件方法向前跳转,并且操作超出了活动SPEAK方法文本的末尾时,控件请求仍然成功。活动的SPEAK请求在响应控制方法后完成并返回SPEAK-COMPLETE事件。如果队列中有更多SPEAK请求,合成器资源将在队列中下一个SPEAK请求的开始处启动。
When a client invokes a CONTROL method to jump backward and the operation jumps to the beginning or beyond the beginning of the speech data of the active SPEAK method, the CONTROL request still succeeds. The response to the CONTROL request contains the speak-restart header field, and the active SPEAK request restarts from the beginning of its speech data.
当客户机调用控制方法向后跳转,并且操作跳转到活动SPEAK方法的语音数据的开头或后面时,控制请求仍然成功。对控制请求的响应包含speak restart header字段,活动speak请求从其语音数据的开头重新启动。
These two behaviors can be used to rewind or fast-forward across multiple speech requests, if the client wants to break up a speech markup text into multiple SPEAK requests.
如果客户端希望将语音标记文本分解为多个语音请求,则可以使用这两种行为在多个语音请求之间倒带或快进。
If a SPEAK request was active when the CONTROL method was received, the server MUST return an Active-Request-Id-List header field containing the request-id of the SPEAK request that was active.
如果收到控制方法时SPEAK请求处于活动状态,则服务器必须返回一个活动请求Id列表头字段,其中包含活动SPEAK请求的请求Id。
C->S: MRCP/2.0 ... SPEAK 543258 Channel-Identifier:32AECB23433802@speechsynth Voice-gender:neutral Voice-age:25 Prosody-volume:medium Content-Type:application/ssml+xml Content-Length:...
C->S:MRCP/2.0。。。语音543258信道标识符:32AECB23433802@speechsynth声音性别:中性声音年龄:25韵律音量:中等内容类型:应用程序/ssml+xml内容长度:。。。
<?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Stephanie Williams and arrived at <break/> <say-as interpret-as="vxml:time">0342p</say-as>.</s>
<?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Stephanie Williams and arrived at <break/> <say-as interpret-as="vxml:time">0342p</say-as>.</s>
<s>The subject is <prosody rate="-20%">ski trip</prosody></s> </p> </speak>
<s>The subject is <prosody rate="-20%">ski trip</prosody></s> </p> </speak>
S->C: MRCP/2.0 ... 543258 200 IN-PROGRESS Channel-Identifier:32AECB23433802@speechsynth Speech-Marker:timestamp=857205016059
S->C: MRCP/2.0 ... 543258 200 IN-PROGRESS Channel-Identifier:32AECB23433802@speechsynth Speech-Marker:timestamp=857205016059
C->S: MRCP/2.0 ... CONTROL 543259 Channel-Identifier:32AECB23433802@speechsynth Prosody-rate:fast
C->S: MRCP/2.0 ... CONTROL 543259 Channel-Identifier:32AECB23433802@speechsynth Prosody-rate:fast
S->C: MRCP/2.0 ... 543259 200 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Active-Request-Id-List:543258 Speech-Marker:timestamp=857206027059
S->C: MRCP/2.0 ... 543259 200 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Active-Request-Id-List:543258 Speech-Marker:timestamp=857206027059
C->S: MRCP/2.0 ... CONTROL 543260 Channel-Identifier:32AECB23433802@speechsynth Jump-Size:-15 Words
C->S: MRCP/2.0 ... CONTROL 543260 Channel-Identifier:32AECB23433802@speechsynth Jump-Size:-15 Words
S->C: MRCP/2.0 ... 543260 200 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Active-Request-Id-List:543258 Speech-Marker:timestamp=857206039059
S->C: MRCP/2.0 ... 543260 200 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Active-Request-Id-List:543258 Speech-Marker:timestamp=857206039059
CONTROL Example
控制示例
This is an Event message from the synthesizer resource to the client that indicates the corresponding SPEAK request was completed. The request-id field matches the request-id of the SPEAK request that initiated the speech that just completed. The request-state field is set to COMPLETE by the server, indicating that this is the last event with the corresponding request-id. The Completion-Cause header field specifies the cause code pertaining to the status and reason of request completion, such as the SPEAK completed normally or because of an error, kill-on-barge-in, etc.
这是从合成器资源发送到客户端的事件消息,指示相应的SPEAK请求已完成。request id字段与发起刚刚完成的语音的SPEAK请求的request id匹配。服务器将“请求状态”字段设置为“完成”,表示这是具有相应请求id的最后一个事件。完成原因标题字段指定了与请求完成状态和原因相关的原因代码,例如正常完成的讲话或由于错误而完成的讲话、在驳船上压井等。
C->S: MRCP/2.0 ... SPEAK 543260 Channel-Identifier:32AECB23433802@speechsynth Voice-gender:neutral Voice-age:25 Prosody-volume:medium Content-Type:application/ssml+xml Content-Length:...
C->S:MRCP/2.0。。。语音543260通道标识符:32AECB23433802@speechsynth声音性别:中性声音年龄:25韵律音量:中等内容类型:应用程序/ssml+xml内容长度:。。。
<?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Stephanie Williams and arrived at <break/> <say-as interpret-as="vxml:time">0342p</say-as>.</s> <s>The subject is <prosody rate="-20%">ski trip</prosody></s> </p> </speak>
<?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Stephanie Williams and arrived at <break/> <say-as interpret-as="vxml:time">0342p</say-as>.</s> <s>The subject is <prosody rate="-20%">ski trip</prosody></s> </p> </speak>
S->C: MRCP/2.0 ... 543260 200 IN-PROGRESS Channel-Identifier:32AECB23433802@speechsynth Speech-Marker:timestamp=857206027059
S->C: MRCP/2.0 ... 543260 200 IN-PROGRESS Channel-Identifier:32AECB23433802@speechsynth Speech-Marker:timestamp=857206027059
S->C: MRCP/2.0 ... SPEAK-COMPLETE 543260 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Completion-Cause:000 normal Speech-Marker:timestamp=857206039059
S->C: MRCP/2.0 ... SPEAK-COMPLETE 543260 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Completion-Cause:000 normal Speech-Marker:timestamp=857206039059
SPEAK-COMPLETE Example
说出完整的例子
This is an event generated by the synthesizer resource to the client when the synthesizer encounters a marker tag in the speech markup it is currently processing. The value of the request-id field MUST match that of the corresponding SPEAK request. The request-state field MUST have the value "IN-PROGRESS" as the speech is still not complete. The value of the speech marker tag hit, describing where the synthesizer is in the speech markup, MUST be returned in the Speech-Marker header field, along with an NTP timestamp indicating the instant in the output speech stream that the marker was encountered. The SPEECH-MARKER event MUST also be generated with a null marker value and output NTP timestamp when a SPEAK request in Pending-State (i.e., in the queue) changes state to IN-PROGRESS and starts speaking. The NTP timestamp MUST be synchronized with the RTP timestamp used to generate the speech stream through standard RTCP machinery.
这是合成器资源在其当前处理的语音标记中遇到标记标记时向客户端生成的事件。请求id字段的值必须与相应SPEAK请求的值匹配。请求状态字段的值必须为“进行中”,因为语音尚未完成。语音标记标记命中的值(描述合成器在语音标记中的位置)必须与NTP时间戳一起返回到语音标记头字段中,NTP时间戳指示在输出语音流中遇到该标记的瞬间。当处于挂起状态(即队列中)的讲话请求将状态更改为进行中并开始讲话时,还必须使用空标记值生成SPEECH-MARKER事件,并输出NTP时间戳。NTP时间戳必须与用于通过标准RTCP机制生成语音流的RTP时间戳同步。
C->S: MRCP/2.0 ... SPEAK 543261 Channel-Identifier:32AECB23433802@speechsynth Voice-gender:neutral Voice-age:25 Prosody-volume:medium Content-Type:application/ssml+xml Content-Length:...
C->S:MRCP/2.0。。。语音543261信道标识符:32AECB23433802@speechsynth声音性别:中性声音年龄:25韵律音量:中等内容类型:应用程序/ssml+xml内容长度:。。。
<?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Stephanie Williams and arrived at <break/>
<?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Stephanie Williams and arrived at <break/>
<say-as interpret-as="vxml:time">0342p</say-as>.</s> <mark name="here"/> <s>The subject is <prosody rate="-20%">ski trip</prosody> </s> <mark name="ANSWER"/> </p> </speak>
<say-as interpret-as="vxml:time">0342p</say-as>.</s> <mark name="here"/> <s>The subject is <prosody rate="-20%">ski trip</prosody> </s> <mark name="ANSWER"/> </p> </speak>
S->C: MRCP/2.0 ... 543261 200 IN-PROGRESS Channel-Identifier:32AECB23433802@speechsynth Speech-Marker:timestamp=857205015059
S->C: MRCP/2.0 ... 543261 200 IN-PROGRESS Channel-Identifier:32AECB23433802@speechsynth Speech-Marker:timestamp=857205015059
S->C: MRCP/2.0 ... SPEECH-MARKER 543261 IN-PROGRESS Channel-Identifier:32AECB23433802@speechsynth Speech-Marker:timestamp=857206027059;here
S->C: MRCP/2.0 ... SPEECH-MARKER 543261 IN-PROGRESS Channel-Identifier:32AECB23433802@speechsynth Speech-Marker:timestamp=857206027059;here
S->C: MRCP/2.0 ... SPEECH-MARKER 543261 IN-PROGRESS Channel-Identifier:32AECB23433802@speechsynth Speech-Marker:timestamp=857206039059;ANSWER
S->C: MRCP/2.0 ... SPEECH-MARKER 543261 IN-PROGRESS Channel-Identifier:32AECB23433802@speechsynth Speech-Marker:timestamp=857206039059;ANSWER
S->C: MRCP/2.0 ... SPEAK-COMPLETE 543261 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Completion-Cause:000 normal Speech-Marker:timestamp=857207689259;ANSWER
S->C: MRCP/2.0 ... SPEAK-COMPLETE 543261 COMPLETE Channel-Identifier:32AECB23433802@speechsynth Completion-Cause:000 normal Speech-Marker:timestamp=857207689259;ANSWER
SPEECH-MARKER Example
语音标记示例
The DEFINE-LEXICON method, from the client to the server, provides a lexicon and tells the server to load or unload the lexicon (see Section 8.4.16). The media type of the lexicon is provided in the Content-Type header (see Section 8.5.2). One such media type is "application/pls+xml" for the Pronunciation Lexicon Specification (PLS) [W3C.REC-pronunciation-lexicon-20081014] [RFC4267].
从客户端到服务器的DEFINE-LEXICON方法提供了一个词典,并告诉服务器加载或卸载词典(参见第8.4.16节)。词典的媒体类型在内容类型标题中提供(见第8.5.2节)。其中一种媒体类型是发音词典规范(pls)[W3C.REC-发音词典-20081014][RFC4267]的“应用程序/pls+xml”。
If the server resource is in the speaking or paused state, the server MUST respond with a failure status-code of 402 "Method not valid in this state".
如果服务器资源处于通话或暂停状态,则服务器必须使用故障状态代码402“Method not valid in this state”(方法在此状态下无效)进行响应。
If the resource is in the idle state and is able to successfully load/unload the lexicon, the status MUST return a 200 "Success" status-code and the request-state MUST be COMPLETE.
如果资源处于空闲状态,并且能够成功加载/卸载词典,则状态必须返回200“Success”状态代码,并且请求状态必须为COMPLETE。
If the synthesizer could not define the lexicon for some reason, for example, because the download failed or the lexicon was in an unsupported form, the server MUST respond with a failure status-code of 407 and a Completion-Cause header field describing the failure reason.
如果合成器由于某种原因无法定义词典,例如,由于下载失败或词典的形式不受支持,服务器必须使用故障状态代码407和描述故障原因的完成原因标头字段进行响应。
The speech recognizer resource receives an incoming voice stream and provides the client with an interpretation of what was spoken in textual form.
语音识别器资源接收传入的语音流,并向客户端提供以文本形式说出的内容的解释。
The recognizer resource is controlled by MRCPv2 requests from the client. The recognizer resource can both respond to these requests and generate asynchronous events to the client to indicate conditions of interest during the processing of the method.
识别器资源由来自客户端的MRCPv2请求控制。识别器资源既可以响应这些请求,也可以向客户端生成异步事件,以指示方法处理过程中感兴趣的条件。
This section applies to the following resource types.
本节适用于以下资源类型。
1. speechrecog
1. 演讲记录
2. dtmfrecog
2. dtmfrecog
The difference between the above two resources is in their level of support for recognition grammars. The "dtmfrecog" resource type is capable of recognizing only DTMF digits and hence accepts only DTMF grammars. It only generates barge-in for DTMF inputs and ignores speech. The "speechrecog" resource type can recognize regular speech as well as DTMF digits and hence MUST support grammars describing either speech or DTMF. This resource generates barge-in events for speech and/or DTMF. By analyzing the grammars that are activated by the RECOGNIZE method, it determines if a barge-in should occur for speech and/or DTMF. When the recognizer decides it needs to generate a barge-in, it also generates a START-OF-INPUT event to the client. The recognizer resource MAY support recognition in the normal or hotword modes or both (although note that a single "speechrecog" resource does not perform normal and hotword mode recognition simultaneously). For implementations where a single recognizer resource does not support both modes, or simultaneous normal and hotword recognition is desired, the two modes can be invoked through separate resources allocated to the same SIP dialog (with different MRCP session identifiers) and share the RTP audio feed.
上述两种资源之间的区别在于它们对识别语法的支持程度。“dtmfrecog”资源类型只能识别DTMF数字,因此只能接受DTMF语法。它只为DTMF输入生成插入,而忽略语音。“speechrecog”资源类型可以识别常规语音和DTMF数字,因此必须支持描述语音或DTMF的语法。此资源为语音和/或DTMF生成驳入事件。通过分析由识别方法激活的语法,它确定语音和/或DTMF是否应发生中断。当识别器决定需要生成驳入时,它还会向客户端生成输入启动事件。识别器资源可能支持在正常或热词模式或两者中的识别(尽管请注意,单个“speechrecog”资源不会同时执行正常和热词模式识别)。对于单个识别器资源不支持这两种模式的实现,或者需要同时进行正常和热词识别,可以通过分配给同一SIP对话框(具有不同MRCP会话标识符)的单独资源调用这两种模式,并共享RTP音频馈送。
The capabilities of the recognizer resource are enumerated below:
识别器资源的功能列举如下:
Normal Mode Recognition Normal mode recognition tries to match all of the speech or DTMF against the grammar and returns a no-match status if the input fails to match or the method times out.
正常模式识别正常模式识别尝试根据语法匹配所有语音或DTMF,如果输入不匹配或方法超时,则返回不匹配状态。
Hotword Mode Recognition Hotword mode is where the recognizer looks for a match against specific speech grammar or DTMF sequence and ignores speech or DTMF that does not match. The recognition completes only if there is a successful match of grammar, if the client cancels the request, or if there is a non-input or recognition timeout.
热词模式识别热词模式是识别器根据特定语音语法或DTMF序列查找匹配项,并忽略不匹配的语音或DTMF。只有在语法匹配成功、客户端取消请求或存在非输入或识别超时时,识别才会完成。
Voice Enrolled Grammars A recognizer resource MAY optionally support Voice Enrolled Grammars. With this functionality, enrollment is performed using a person's voice. For example, a list of contacts can be created and maintained by recording the person's names using the caller's voice. This technique is sometimes also called speaker-dependent recognition.
语音注册语法识别器资源可以选择支持语音注册语法。使用此功能,可以使用人的声音进行注册。例如,可以通过使用呼叫者的声音记录姓名来创建和维护联系人列表。这种技术有时也称为说话人相关识别。
Interpretation A recognizer resource MAY be employed strictly for its natural language interpretation capabilities by supplying it with a text string as input instead of speech. In this mode, the resource takes text as input and produces an "interpretation" of the input according to the supplied grammar.
判读识别器资源可以通过提供文本字符串作为输入而不是语音来严格利用其自然语言判读能力。在这种模式下,资源将文本作为输入,并根据提供的语法生成输入的“解释”。
Voice enrollment has the concept of an enrollment session. A session to add a new phrase to a personal grammar involves the initial enrollment followed by a repeat of enough utterances before committing the new phrase to the personal grammar. Each time an utterance is recorded, it is compared for similarity with the other samples and a clash test is performed against other entries in the personal grammar to ensure there are no similar and confusable entries.
语音注册具有注册会话的概念。将新短语添加到个人语法的会话包括初始注册,然后在将新短语提交到个人语法之前重复足够的话语。每次记录话语时,都会将其与其他样本进行相似性比较,并对个人语法中的其他条目进行冲突测试,以确保没有相似和易混淆的条目。
Enrollment is done using a recognizer resource. Controlling which utterances are to be considered for enrollment of a new phrase is done by setting a header field (see Section 9.4.39) in the Recognize request.
使用识别器资源完成注册。通过在识别请求中设置标题字段(见第9.4.39节),可以控制在注册新短语时考虑哪些话语。
Interpretation is accomplished through the INTERPRET method (Section 9.20) and the Interpret-Text header field (Section 9.4.30).
通过解释方法(第9.20节)和解释文本标题字段(第9.4.30节)完成解释。
The recognizer resource maintains a state machine to process MRCPv2 requests from the client.
识别器资源维护一个状态机来处理来自客户端的MRCPv2请求。
Idle Recognizing Recognized State State State | | | |---------RECOGNIZE---->|---RECOGNITION-COMPLETE-->| |<------STOP------------|<-----RECOGNIZE-----------| | | | | |--------| |-----------| | START-OF-INPUT | GET-RESULT | | |------->| |---------->| |------------| | | | DEFINE-GRAMMAR |----------| | |<-----------| | START-INPUT-TIMERS | | |<---------| | |------| | | | INTERPRET | | |<-----| |------| | | | RECOGNIZE | |-------| |<-----| | | STOP | |<------| | |<-------------------STOP--------------------------| |<-------------------DEFINE-GRAMMAR----------------|
Idle Recognizing Recognized State State State | | | |---------RECOGNIZE---->|---RECOGNITION-COMPLETE-->| |<------STOP------------|<-----RECOGNIZE-----------| | | | | |--------| |-----------| | START-OF-INPUT | GET-RESULT | | |------->| |---------->| |------------| | | | DEFINE-GRAMMAR |----------| | |<-----------| | START-INPUT-TIMERS | | |<---------| | |------| | | | INTERPRET | | |<-----| |------| | | | RECOGNIZE | |-------| |<-----| | | STOP | |<------| | |<-------------------STOP--------------------------| |<-------------------DEFINE-GRAMMAR----------------|
Recognizer State Machine
识别器状态机
If a recognizer resource supports voice enrolled grammars, starting an enrollment session does not change the state of the recognizer resource. Once an enrollment session is started, then utterances are enrolled by calling the RECOGNIZE method repeatedly. The state of the speech recognizer resource goes from IDLE to RECOGNIZING state each time RECOGNIZE is called.
如果识别器资源支持语音注册语法,则启动注册会话不会更改识别器资源的状态。一旦注册会话启动,则通过反复调用RECOGNIZE方法来注册话语。每次调用RECOGNIZE时,语音识别器资源的状态从空闲变为识别状态。
The recognizer supports the following methods.
识别器支持以下方法。
recognizer-method = recog-only-method / enrollment-method
识别器方法=仅记录方法/注册方法
recog-only-method = "DEFINE-GRAMMAR" / "RECOGNIZE" / "INTERPRET" / "GET-RESULT" / "START-INPUT-TIMERS" / "STOP"
recog-only-method = "DEFINE-GRAMMAR" / "RECOGNIZE" / "INTERPRET" / "GET-RESULT" / "START-INPUT-TIMERS" / "STOP"
It is OPTIONAL for a recognizer resource to support voice enrolled grammars. If the recognizer resource does support voice enrolled grammars, it MUST support the following methods.
识别器资源支持语音注册语法是可选的。如果识别器资源确实支持语音注册语法,则它必须支持以下方法。
enrollment-method = "START-PHRASE-ENROLLMENT" / "ENROLLMENT-ROLLBACK" / "END-PHRASE-ENROLLMENT" / "MODIFY-PHRASE" / "DELETE-PHRASE"
enrollment-method = "START-PHRASE-ENROLLMENT" / "ENROLLMENT-ROLLBACK" / "END-PHRASE-ENROLLMENT" / "MODIFY-PHRASE" / "DELETE-PHRASE"
The recognizer can generate the following events.
识别器可以生成以下事件。
recognizer-event = "START-OF-INPUT" / "RECOGNITION-COMPLETE" / "INTERPRETATION-COMPLETE"
识别器事件=“输入开始”/“识别完成”/“解释完成”
A recognizer message can contain header fields containing request options and information to augment the Method, Response, or Event message it is associated with.
识别器消息可以包含包含请求选项和信息的标题字段,以增强与之关联的方法、响应或事件消息。
recognizer-header = recog-only-header / enrollment-header
识别器标头=仅记录标头/注册标头
recog-only-header = confidence-threshold / sensitivity-level / speed-vs-accuracy / n-best-list-length / no-input-timeout / input-type / recognition-timeout / waveform-uri / input-waveform-uri / completion-cause / completion-reason / recognizer-context-block / start-input-timers / speech-complete-timeout
recog-only-header = confidence-threshold / sensitivity-level / speed-vs-accuracy / n-best-list-length / no-input-timeout / input-type / recognition-timeout / waveform-uri / input-waveform-uri / completion-cause / completion-reason / recognizer-context-block / start-input-timers / speech-complete-timeout
/ speech-incomplete-timeout / dtmf-interdigit-timeout / dtmf-term-timeout / dtmf-term-char / failed-uri / failed-uri-cause / save-waveform / media-type / new-audio-channel / speech-language / ver-buffer-utterance / recognition-mode / cancel-if-queue / hotword-max-duration / hotword-min-duration / interpret-text / dtmf-buffer-time / clear-dtmf-buffer / early-no-match
/ speech-incomplete-timeout / dtmf-interdigit-timeout / dtmf-term-timeout / dtmf-term-char / failed-uri / failed-uri-cause / save-waveform / media-type / new-audio-channel / speech-language / ver-buffer-utterance / recognition-mode / cancel-if-queue / hotword-max-duration / hotword-min-duration / interpret-text / dtmf-buffer-time / clear-dtmf-buffer / early-no-match
If a recognizer resource supports voice enrolled grammars, the following header fields are also used.
如果识别器资源支持语音注册语法,则还将使用以下标题字段。
enrollment-header = num-min-consistent-pronunciations / consistency-threshold / clash-threshold / personal-grammar-uri / enroll-utterance / phrase-id / phrase-nl / weight / save-best-waveform / new-phrase-id / confusable-phrases-uri / abort-phrase-enrollment
enrollment-header = num-min-consistent-pronunciations / consistency-threshold / clash-threshold / personal-grammar-uri / enroll-utterance / phrase-id / phrase-nl / weight / save-best-waveform / new-phrase-id / confusable-phrases-uri / abort-phrase-enrollment
For enrollment-specific header fields that can appear as part of SET-PARAMS or GET-PARAMS methods, the following general rule applies: the START-PHRASE-ENROLLMENT method MUST be invoked before these header fields may be set through the SET-PARAMS method or retrieved through the GET-PARAMS method.
对于可以作为SET-PARAMS或GET-PARAMS方法的一部分出现的特定于注册的头字段,以下一般规则适用:必须先调用START-Phase-enrollment方法,然后才能通过SET-PARAMS方法设置或通过GET-PARAMS方法检索这些头字段。
Note that the Waveform-URI header field of the Recognizer resource can also appear in the response to the END-PHRASE-ENROLLMENT method.
请注意,识别器资源的波形URI头字段也可以出现在对结束阶段注册方法的响应中。
When a recognizer resource recognizes or matches a spoken phrase with some portion of the grammar, it associates a confidence level with that match. The Confidence-Threshold header field tells the recognizer resource what confidence level the client considers a successful match. This is a float value between 0.0-1.0 indicating the recognizer's confidence in the recognition. If the recognizer determines that there is no candidate match with a confidence that is greater than the confidence threshold, then it MUST return no-match as the recognition result. This header field MAY occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS. The default value for this header field is implementation specific, as is the interpretation of any specific value for this header field. Although values for servers from different vendors are not comparable, it is expected that clients will tune this value over time for a given server.
当识别器资源识别或匹配口语短语与语法的某些部分时,它会将置信水平与该匹配相关联。置信阈值标头字段告诉识别器资源客户端认为成功匹配的置信水平。这是一个介于0.0-1.0之间的浮点值,表示识别器对识别的信心。如果识别器确定不存在置信度大于置信阈值的候选匹配,则它必须将不匹配作为识别结果返回。此标题字段可能出现在RECOGNIZE、SET-PARAMS或GET-PARAMS中。此标头字段的默认值是特定于实现的,以及此标头字段的任何特定值的解释。尽管来自不同供应商的服务器的值不具有可比性,但预计客户机将随时间调整给定服务器的该值。
confidence-threshold = "Confidence-Threshold" ":" FLOAT CRLF
置信阈值=“置信阈值”:“浮动CRLF”
To filter out background noise and not mistake it for speech, the recognizer resource supports a variable level of sound sensitivity. The Sensitivity-Level header field is a float value between 0.0 and 1.0 and allows the client to set the sensitivity level for the recognizer. This header field MAY occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS. A higher value for this header field means higher sensitivity. The default value for this header field is implementation specific, as is the interpretation of any specific value for this header field. Although values for servers from different vendors are not comparable, it is expected that clients will tune this value over time for a given server.
为了滤除背景噪声,并且不会将其误认为是语音,识别器资源支持可变级别的声音灵敏度。灵敏度级别标题字段是介于0.0和1.0之间的浮点值,允许客户端为识别器设置灵敏度级别。此标题字段可能出现在RECOGNIZE、SET-PARAMS或GET-PARAMS中。此标题字段的值越高,表示灵敏度越高。此标头字段的默认值是特定于实现的,以及此标头字段的任何特定值的解释。尽管来自不同供应商的服务器的值不具有可比性,但预计客户机将随时间调整给定服务器的该值。
sensitivity-level = "Sensitivity-Level" ":" FLOAT CRLF
灵敏度级别=“灵敏度级别”:“浮动CRLF”
Depending on the implementation and capability of the recognizer resource it may be tunable towards Performance or Accuracy. Higher accuracy may mean more processing and higher CPU utilization, meaning fewer active sessions per server and vice versa. The value is a float between 0.0 and 1.0. A value of 0.0 means fastest recognition. A value of 1.0 means best accuracy. This header field MAY occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS. The default value for this
根据识别器资源的实现和能力,它可以针对性能或准确性进行调整。更高的精度可能意味着更多的处理和更高的CPU利用率,这意味着每个服务器的活动会话更少,反之亦然。该值是介于0.0和1.0之间的浮点值。值为0.0表示识别速度最快。值为1.0表示最佳精度。此标题字段可能出现在RECOGNIZE、SET-PARAMS或GET-PARAMS中。此项的默认值
header field is implementation specific. Although values for servers from different vendors are not comparable, it is expected that clients will tune this value over time for a given server.
标题字段是特定于实现的。尽管来自不同供应商的服务器的值不具有可比性,但预计客户机将随时间调整给定服务器的该值。
speed-vs-accuracy = "Speed-Vs-Accuracy" ":" FLOAT CRLF
速度与精度=“速度与精度”:“浮动CRLF
When the recognizer matches an incoming stream with the grammar, it may come up with more than one alternative match because of confidence levels in certain words or conversation paths. If this header field is not specified, by default, the recognizer resource returns only the best match above the confidence threshold. The client, by setting this header field, can ask the recognition resource to send it more than one alternative. All alternatives must still be above the Confidence-Threshold. A value greater than one does not guarantee that the recognizer will provide the requested number of alternatives. This header field MAY occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS. The minimum value for this header field is 1. The default value for this header field is 1.
当识别器将传入流与语法匹配时,由于某些单词或对话路径的可信度,它可能会找到多个备选匹配。如果未指定此标头字段,默认情况下,识别器资源仅返回高于置信阈值的最佳匹配。通过设置此标题字段,客户端可以要求识别资源向其发送多个备选方案。所有备选方案必须仍然高于置信阈值。大于1的值不能保证识别器将提供所需数量的备选方案。此标题字段可能出现在RECOGNIZE、SET-PARAMS或GET-PARAMS中。此标题字段的最小值为1。此标题字段的默认值为1。
n-best-list-length = "N-Best-List-Length" ":" 1*19DIGIT CRLF
n-best-list-length = "N-Best-List-Length" ":" 1*19DIGIT CRLF
When the recognizer detects barge-in-able input and generates a START-OF-INPUT event, that event MUST carry this header field to specify whether the input that caused the barge-in was DTMF or speech.
当识别器检测到驳船可输入并生成输入开始事件时,该事件必须携带此标题字段,以指定导致驳船进入的输入是DTMF还是语音。
input-type = "Input-Type" ":" inputs CRLF inputs = "speech" / "dtmf"
input-type = "Input-Type" ":" inputs CRLF inputs = "speech" / "dtmf"
When recognition is started and there is no speech detected for a certain period of time, the recognizer can send a RECOGNITION-COMPLETE event to the client with a Completion-Cause of "no-input-timeout" and terminate the recognition operation. The client can use the No-Input-Timeout header field to set this timeout. The value is in milliseconds and can range from 0 to an implementation-specific maximum value. This header field MAY occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS. The default value is implementation specific.
当识别启动且在一定时间段内未检测到语音时,识别器可向客户端发送识别完成事件,完成原因为“无输入超时”,并终止识别操作。客户端可以使用无输入超时标头字段设置此超时。该值以毫秒为单位,范围从0到特定于实现的最大值。此标题字段可能出现在RECOGNIZE、SET-PARAMS或GET-PARAMS中。默认值是特定于实现的。
no-input-timeout = "No-Input-Timeout" ":" 1*19DIGIT CRLF
no-input-timeout = "No-Input-Timeout" ":" 1*19DIGIT CRLF
When recognition is started and there is no match for a certain period of time, the recognizer can send a RECOGNITION-COMPLETE event to the client and terminate the recognition operation. The Recognition-Timeout header field allows the client to set this timeout value. The value is in milliseconds. The value for this header field ranges from 0 to an implementation-specific maximum value. The default value is 10 seconds. This header field MAY occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS.
当识别启动且在一定时间段内没有匹配时,识别器可以向客户端发送识别完成事件并终止识别操作。识别超时标头字段允许客户端设置此超时值。该值以毫秒为单位。此标头字段的值范围从0到特定于实现的最大值。默认值为10秒。此标题字段可能出现在RECOGNIZE、SET-PARAMS或GET-PARAMS中。
recognition-timeout = "Recognition-Timeout" ":" 1*19DIGIT CRLF
recognition-timeout = "Recognition-Timeout" ":" 1*19DIGIT CRLF
If the Save-Waveform header field is set to "true", the recognizer MUST record the incoming audio stream of the recognition into a stored form and provide a URI for the client to access it. This header field MUST be present in the RECOGNITION-COMPLETE event if the Save-Waveform header field was set to "true". The value of the header field MUST be empty if there was some error condition preventing the server from recording. Otherwise, the URI generated by the server MUST be unambiguous across the server and all its recognition sessions. The content associated with the URI MUST be available to the client until the MRCPv2 session terminates.
如果Save Waveform header(保存波形标题)字段设置为“true”,则识别器必须将识别的传入音频流记录到存储表单中,并提供URI供客户端访问。如果Save Waveform header字段设置为“true”,则此header字段必须出现在RECOGNITION-COMPLETE事件中。如果存在阻止服务器录制的错误情况,则标头字段的值必须为空。否则,服务器生成的URI在整个服务器及其所有识别会话中必须是明确的。在MRCPv2会话终止之前,与URI关联的内容必须对客户端可用。
Similarly, if the Save-Best-Waveform header field is set to "true", the recognizer MUST save the audio stream for the best repetition of the phrase that was used during the enrollment session. The recognizer MUST then record the recognized audio and make it available to the client by returning a URI in the Waveform-URI header field in the response to the END-PHRASE-ENROLLMENT method. The value of the header field MUST be empty if there was some error condition preventing the server from recording. Otherwise, the URI generated by the server MUST be unambiguous across the server and all its recognition sessions. The content associated with the URI MUST be available to the client until the MRCPv2 session terminates. See the discussion on the sensitivity of saved waveforms in Section 12.
类似地,如果Save Best Waveform header(保存最佳波形标题)字段设置为“true”,则识别器必须保存音频流,以便在注册会话期间使用短语的最佳重复。然后,识别器必须记录识别的音频,并通过在对结束语注册方法的响应中的波形URI头字段中返回URI,使其可供客户端使用。如果存在阻止服务器录制的错误情况,则标头字段的值必须为空。否则,服务器生成的URI在整个服务器及其所有识别会话中必须是明确的。在MRCPv2会话终止之前,与URI关联的内容必须对客户端可用。参见第12节中关于保存波形灵敏度的讨论。
The server MUST also return the size in octets and the duration in milliseconds of the recorded audio waveform as parameters associated with the header field.
服务器还必须返回录制音频波形的大小(以八位字节为单位)和持续时间(以毫秒为单位),作为与标头字段关联的参数。
waveform-uri = "Waveform-URI" ":" ["<" uri ">" ";" "size" "=" 1*19DIGIT ";" "duration" "=" 1*19DIGIT] CRLF
waveform-uri = "Waveform-URI" ":" ["<" uri ">" ";" "size" "=" 1*19DIGIT ";" "duration" "=" 1*19DIGIT] CRLF
This header field MAY be specified in the SET-PARAMS, GET-PARAMS, or the RECOGNIZE methods and tells the server resource the media type in which to store captured audio or video, such as the one captured and returned by the Waveform-URI header field.
该报头字段可以在SET-PARAMS、GET-PARAMS或recognized方法中指定,并告诉服务器资源存储捕获的音频或视频的媒体类型,例如波形URI报头字段捕获并返回的媒体类型。
media-type = "Media-Type" ":" media-type-value CRLF
媒体类型=“媒体类型”“:”媒体类型值CRLF
This optional header field specifies a URI pointing to audio content to be processed by the RECOGNIZE operation. This enables the client to request recognition from a specified buffer or audio file.
此可选标头字段指定一个URI,该URI指向识别操作要处理的音频内容。这使客户端能够从指定的缓冲区或音频文件请求识别。
input-waveform-uri = "Input-Waveform-URI" ":" uri CRLF
输入波形uri=“输入波形uri”“:”uri CRLF
This header field MUST be part of a RECOGNITION-COMPLETE event coming from the recognizer resource to the client. It indicates the reason behind the RECOGNIZE method completion. This header field MUST be sent in the DEFINE-GRAMMAR and RECOGNIZE responses, if they return with a failure status and a COMPLETE state. In the ABNF below, the cause-code contains a numerical value selected from the Cause-Code column of the following table. The cause-name contains the corresponding token selected from the Cause-Name column.
此标头字段必须是从识别器资源发送到客户端的识别完成事件的一部分。它指出了识别方法完成的原因。如果响应返回失败状态和完整状态,则必须在DEFINE-GRAMMAR和Recognite响应中发送此标头字段。在下面的ABNF中,原因代码包含从下表的“原因代码”列中选择的数值。原因名称包含从“原因名称”列中选择的相应标记。
completion-cause = "Completion-Cause" ":" cause-code SP cause-name CRLF cause-code = 3DIGIT cause-name = *VCHAR
completion-cause = "Completion-Cause" ":" cause-code SP cause-name CRLF cause-code = 3DIGIT cause-name = *VCHAR
+------------+-----------------------+------------------------------+ | Cause-Code | Cause-Name | Description | +------------+-----------------------+------------------------------+ | 000 | success | RECOGNIZE completed with a | | | | match or DEFINE-GRAMMAR | | | | succeeded in downloading and | | | | compiling the grammar. | | | | | | 001 | no-match | RECOGNIZE completed, but no | | | | match was found. | | | | | | 002 | no-input-timeout | RECOGNIZE completed without | | | | a match due to a | | | | no-input-timeout. | | | | | | 003 | hotword-maxtime | RECOGNIZE in hotword mode | | | | completed without a match | | | | due to a | | | | recognition-timeout. | | | | | | 004 | grammar-load-failure | RECOGNIZE failed due to | | | | grammar load failure. | | | | | | 005 | grammar-compilation- | RECOGNIZE failed due to | | | failure | grammar compilation failure. | | | | | | 006 | recognizer-error | RECOGNIZE request terminated | | | | prematurely due to a | | | | recognizer error. | | | | | | 007 | speech-too-early | RECOGNIZE request terminated | | | | because speech was too | | | | early. This happens when the | | | | audio stream is already | | | | "in-speech" when the | | | | RECOGNIZE request was | | | | received. | | | | | | 008 | success-maxtime | RECOGNIZE request terminated | | | | because speech was too long | | | | but whatever was spoken till | | | | that point was a full match. | | | | | | 009 | uri-failure | Failure accessing a URI. | | | | | | 010 | language-unsupported | Language not supported. | | | | |
+------------+-----------------------+------------------------------+ | Cause-Code | Cause-Name | Description | +------------+-----------------------+------------------------------+ | 000 | success | RECOGNIZE completed with a | | | | match or DEFINE-GRAMMAR | | | | succeeded in downloading and | | | | compiling the grammar. | | | | | | 001 | no-match | RECOGNIZE completed, but no | | | | match was found. | | | | | | 002 | no-input-timeout | RECOGNIZE completed without | | | | a match due to a | | | | no-input-timeout. | | | | | | 003 | hotword-maxtime | RECOGNIZE in hotword mode | | | | completed without a match | | | | due to a | | | | recognition-timeout. | | | | | | 004 | grammar-load-failure | RECOGNIZE failed due to | | | | grammar load failure. | | | | | | 005 | grammar-compilation- | RECOGNIZE failed due to | | | failure | grammar compilation failure. | | | | | | 006 | recognizer-error | RECOGNIZE request terminated | | | | prematurely due to a | | | | recognizer error. | | | | | | 007 | speech-too-early | RECOGNIZE request terminated | | | | because speech was too | | | | early. This happens when the | | | | audio stream is already | | | | "in-speech" when the | | | | RECOGNIZE request was | | | | received. | | | | | | 008 | success-maxtime | RECOGNIZE request terminated | | | | because speech was too long | | | | but whatever was spoken till | | | | that point was a full match. | | | | | | 009 | uri-failure | Failure accessing a URI. | | | | | | 010 | language-unsupported | Language not supported. | | | | |
| 011 | cancelled | A new RECOGNIZE cancelled | | | | this one, or a prior | | | | RECOGNIZE failed while this | | | | one was still in the queue. | | | | | | 012 | semantics-failure | Recognition succeeded, but | | | | semantic interpretation of | | | | the recognized input failed. | | | | The RECOGNITION-COMPLETE | | | | event MUST contain the | | | | Recognition result with only | | | | input text and no | | | | interpretation. | | | | | | 013 | partial-match | Speech Incomplete Timeout | | | | expired before there was a | | | | full match. But whatever was | | | | spoken till that point was a | | | | partial match to one or more | | | | grammars. | | | | | | 014 | partial-match-maxtime | The Recognition-Timeout | | | | expired before full match | | | | was achieved. But whatever | | | | was spoken till that point | | | | was a partial match to one | | | | or more grammars. | | | | | | 015 | no-match-maxtime | The Recognition-Timeout | | | | expired. Whatever was spoken | | | | till that point did not | | | | match any of the grammars. | | | | This cause could also be | | | | returned if the recognizer | | | | does not support detecting | | | | partial grammar matches. | | | | | | 016 | grammar-definition- | Any DEFINE-GRAMMAR error | | | failure | other than | | | | grammar-load-failure and | | | | grammar-compilation-failure. | +------------+-----------------------+------------------------------+
| 011 | cancelled | A new RECOGNIZE cancelled | | | | this one, or a prior | | | | RECOGNIZE failed while this | | | | one was still in the queue. | | | | | | 012 | semantics-failure | Recognition succeeded, but | | | | semantic interpretation of | | | | the recognized input failed. | | | | The RECOGNITION-COMPLETE | | | | event MUST contain the | | | | Recognition result with only | | | | input text and no | | | | interpretation. | | | | | | 013 | partial-match | Speech Incomplete Timeout | | | | expired before there was a | | | | full match. But whatever was | | | | spoken till that point was a | | | | partial match to one or more | | | | grammars. | | | | | | 014 | partial-match-maxtime | The Recognition-Timeout | | | | expired before full match | | | | was achieved. But whatever | | | | was spoken till that point | | | | was a partial match to one | | | | or more grammars. | | | | | | 015 | no-match-maxtime | The Recognition-Timeout | | | | expired. Whatever was spoken | | | | till that point did not | | | | match any of the grammars. | | | | This cause could also be | | | | returned if the recognizer | | | | does not support detecting | | | | partial grammar matches. | | | | | | 016 | grammar-definition- | Any DEFINE-GRAMMAR error | | | failure | other than | | | | grammar-load-failure and | | | | grammar-compilation-failure. | +------------+-----------------------+------------------------------+
This header field MAY be specified in a RECOGNITION-COMPLETE event coming from the recognizer resource to the client. This contains the reason text behind the RECOGNIZE request completion. The server uses this header field to communicate text describing the reason for the failure, such as the specific error encountered in parsing a grammar markup.
此标头字段可以在从识别器资源发送到客户端的识别完成事件中指定。这包含识别请求完成后的原因文本。服务器使用此标头字段传递描述失败原因的文本,例如解析语法标记时遇到的特定错误。
The completion reason text is provided for client use in logs and for debugging and instrumentation purposes. Clients MUST NOT interpret the completion reason text.
完成原因文本用于日志中的客户端使用以及调试和检测目的。客户不得解释完成原因文本。
completion-reason = "Completion-Reason" ":" quoted-string CRLF
完成原因=“完成原因”“:”带引号的字符串CRLF
This header field MAY be sent as part of the SET-PARAMS or GET-PARAMS request. If the GET-PARAMS method contains this header field with no value, then it is a request to the recognizer to return the recognizer context block. The response to such a message MAY contain a recognizer context block as a typed media message body. If the server returns a recognizer context block, the response MUST contain this header field and its value MUST match the Content-ID of the corresponding media block.
此标头字段可以作为SET-PARAMS或GET-PARAMS请求的一部分发送。如果GET-PARAMS方法包含此标头字段且没有值,则它是对识别器的请求,以返回识别器上下文块。对此类消息的响应可以包含识别器上下文块作为类型化媒体消息体。如果服务器返回识别器上下文块,则响应必须包含此标头字段,且其值必须与相应媒体块的内容ID匹配。
If the SET-PARAMS method contains this header field, it MUST also contain a message body containing the recognizer context data and a Content-ID matching this header field value. This Content-ID MUST match the Content-ID that came with the context data during the GET-PARAMS operation.
如果SET-PARAMS方法包含此标头字段,则它还必须包含包含识别器上下文数据的消息正文和与此标头字段值匹配的内容ID。此内容ID必须与GET-PARAMS操作期间上下文数据附带的内容ID匹配。
An implementation choosing to use this mechanism to hand off recognizer context data between servers MUST distinguish its implementation-specific block of data by using an IANA-registered content type in the IANA Media Type vendor tree.
选择使用此机制在服务器之间传递识别器上下文数据的实现必须通过使用IANA媒体类型供应商树中的IANA注册内容类型来区分其特定于实现的数据块。
recognizer-context-block = "Recognizer-Context-Block" ":" [1*VCHAR] CRLF
识别器上下文块=“识别器上下文块”:“[1*VCHAR]CRLF
This header field MAY be sent as part of the RECOGNIZE request. A value of false tells the recognizer to start recognition but not to start the no-input timer yet. The recognizer MUST NOT start the timers until the client sends a START-INPUT-TIMERS request to the recognizer. This is useful in the scenario when the recognizer and
此标头字段可以作为识别请求的一部分发送。值false指示识别器开始识别,但尚未启动无输入计时器。在客户端向识别器发送启动输入计时器请求之前,识别器不得启动计时器。这在识别器和
synthesizer engines are not part of the same session. In such configurations, when a kill-on-barge-in prompt is being played (see Section 8.4.2), the client wants the RECOGNIZE request to be simultaneously active so that it can detect and implement kill-on-barge-in. However, the recognizer SHOULD NOT start the no-input timers until the prompt is finished. The default value is "true".
合成器引擎不属于同一会话。在这种配置中,当驳船压井提示正在播放时(见第8.4.2节),客户希望识别请求同时激活,以便能够检测和实施驳船压井。但是,在提示完成之前,识别器不应启动无输入计时器。默认值为“true”。
start-input-timers = "Start-Input-Timers" ":" BOOLEAN CRLF
启动输入计时器=“启动输入计时器”“:”布尔CRLF
This header field specifies the length of silence required following user speech before the speech recognizer finalizes a result (either accepting it or generating a no-match result). The Speech-Complete-Timeout value applies when the recognizer currently has a complete match against an active grammar, and specifies how long the recognizer MUST wait for more input before declaring a match. By contrast, the Speech-Incomplete-Timeout is used when the speech is an incomplete match to an active grammar. The value is in milliseconds.
此标题字段指定在语音识别器最终确定结果(接受结果或生成不匹配结果)之前,用户讲话后所需的静默时间长度。语音完成超时值适用于识别器当前与活动语法完全匹配的情况,并指定识别器在声明匹配之前必须等待更多输入的时间。相反,当语音与活动语法不完全匹配时,使用语音不完全超时。该值以毫秒为单位。
speech-complete-timeout = "Speech-Complete-Timeout" ":" 1*19DIGIT CRLF
speech-complete-timeout = "Speech-Complete-Timeout" ":" 1*19DIGIT CRLF
A long Speech-Complete-Timeout value delays the result to the client and therefore makes the application's response to a user slow. A short Speech-Complete-Timeout may lead to an utterance being broken up inappropriately. Reasonable speech complete timeout values are typically in the range of 0.3 seconds to 1.0 seconds. The value for this header field ranges from 0 to an implementation-specific maximum value. The default value for this header field is implementation specific. This header field MAY occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS.
长语音完成超时值会将结果延迟到客户端,从而使应用程序对用户的响应变慢。简短的讲话完成超时可能会导致不适当地打断讲话。合理的语音完成超时值通常在0.3秒到1.0秒的范围内。此标头字段的值范围从0到特定于实现的最大值。此标头字段的默认值是特定于实现的。此标题字段可能出现在RECOGNIZE、SET-PARAMS或GET-PARAMS中。
This header field specifies the required length of silence following user speech after which a recognizer finalizes a result. The incomplete timeout applies when the speech prior to the silence is an incomplete match of all active grammars. In this case, once the timeout is triggered, the partial result is rejected (with a Completion-Cause of "partial-match"). The value is in milliseconds. The value for this header field ranges from 0 to an implementation-specific maximum value. The default value for this header field is implementation specific.
此标题字段指定用户讲话后所需的静默时间长度,然后识别器最终确定结果。当静默之前的语音与所有活动语法不完全匹配时,将应用不完全超时。在这种情况下,一旦触发超时,部分结果将被拒绝(完成原因为“部分匹配”)。该值以毫秒为单位。此标头字段的值范围从0到特定于实现的最大值。此标头字段的默认值是特定于实现的。
speech-incomplete-timeout = "Speech-Incomplete-Timeout" ":" 1*19DIGIT CRLF
语音不完整超时=“语音不完整超时”:“1*19位CRLF
The Speech-Incomplete-Timeout also applies when the speech prior to the silence is a complete match of an active grammar, but where it is possible to speak further and still match the grammar. By contrast, the Speech-Complete-Timeout is used when the speech is a complete match to an active grammar and no further spoken words can continue to represent a match.
当静默前的讲话与活动语法完全匹配,但可以进一步讲话且仍与语法匹配时,语音不完整超时也适用。相比之下,如果语音与活动语法完全匹配,并且没有更多的口语词可以继续表示匹配,则使用语音完成超时。
A long Speech-Incomplete-Timeout value delays the result to the client and therefore makes the application's response to a user slow. A short Speech-Incomplete-Timeout may lead to an utterance being broken up inappropriately.
long Speech Complete超时值会延迟客户端的结果,因此会使应用程序对用户的响应变慢。简短的讲话不完全超时可能会导致话语被不适当地打断。
The Speech-Incomplete-Timeout is usually longer than the Speech-Complete-Timeout to allow users to pause mid-utterance (for example, to breathe). This header field MAY occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS.
语音未完成超时通常比语音完成超时长,以允许用户在说话过程中暂停(例如呼吸)。此标题字段可能出现在RECOGNIZE、SET-PARAMS或GET-PARAMS中。
This header field specifies the inter-digit timeout value to use when recognizing DTMF input. The value is in milliseconds. The value for this header field ranges from 0 to an implementation-specific maximum value. The default value is 5 seconds. This header field MAY occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS.
此标题字段指定识别DTMF输入时要使用的位间超时值。该值以毫秒为单位。此标头字段的值范围从0到特定于实现的最大值。默认值为5秒。此标题字段可能出现在RECOGNIZE、SET-PARAMS或GET-PARAMS中。
dtmf-interdigit-timeout = "DTMF-Interdigit-Timeout" ":" 1*19DIGIT CRLF
dtmf-interdigit-timeout = "DTMF-Interdigit-Timeout" ":" 1*19DIGIT CRLF
This header field specifies the terminating timeout to use when recognizing DTMF input. The DTMF-Term-Timeout applies only when no additional input is allowed by the grammar; otherwise, the DTMF-Interdigit-Timeout applies. The value is in milliseconds. The value for this header field ranges from 0 to an implementation-specific maximum value. The default value is 10 seconds. This header field MAY occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS.
此标头字段指定识别DTMF输入时使用的终止超时。DTMF术语超时仅在语法不允许额外输入时适用;否则,将应用DTMF叉指超时。该值以毫秒为单位。此标头字段的值范围从0到特定于实现的最大值。默认值为10秒。此标题字段可能出现在RECOGNIZE、SET-PARAMS或GET-PARAMS中。
dtmf-term-timeout = "DTMF-Term-Timeout" ":" 1*19DIGIT CRLF
dtmf-term-timeout = "DTMF-Term-Timeout" ":" 1*19DIGIT CRLF
This header field specifies the terminating DTMF character for DTMF input recognition. The default value is NULL, which is indicated by an empty header field value. This header field MAY occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS.
此标题字段指定用于DTMF输入识别的终止DTMF字符。默认值为NULL,由空的标题字段值指示。此标题字段可能出现在RECOGNIZE、SET-PARAMS或GET-PARAMS中。
dtmf-term-char = "DTMF-Term-Char" ":" VCHAR CRLF
dtmf term char=“dtmf term char”“:”VCHAR CRLF
When a recognizer needs to fetch or access a URI and the access fails, the server SHOULD provide the failed URI in this header field in the method response, unless there are multiple URI failures, in which case one of the failed URIs MUST be provided in this header field in the method response.
当识别器需要获取或访问URI且访问失败时,服务器应在方法响应的此标头字段中提供失败的URI,除非存在多个URI失败,在这种情况下,必须在方法响应的此标头字段中提供一个失败的URI。
failed-uri = "Failed-URI" ":" absoluteURI CRLF
failed uri=“failed uri”“:”绝对uri CRLF
When a recognizer method needs a recognizer to fetch or access a URI and the access fails, the server MUST provide the URI-specific or protocol-specific response code for the URI in the Failed-URI header field through this header field in the method response. The value encoding is UTF-8 (RFC 3629 [RFC3629]) to accommodate any access protocol, some of which might have a response string instead of a numeric response code.
当识别器方法需要识别器来获取或访问URI且访问失败时,服务器必须通过方法响应中的此标头字段在Failed URI标头字段中为URI提供特定于URI或特定于协议的响应代码。值编码为UTF-8(RFC 3629[RFC3629]),以适应任何访问协议,其中一些协议可能具有响应字符串而不是数字响应代码。
failed-uri-cause = "Failed-URI-Cause" ":" 1*UTFCHAR CRLF
failed-uri-cause = "Failed-URI-Cause" ":" 1*UTFCHAR CRLF
This header field allows the client to request the recognizer resource to save the audio input to the recognizer. The recognizer resource MUST then attempt to record the recognized audio, without endpointing, and make it available to the client in the form of a URI returned in the Waveform-URI header field in the RECOGNITION-COMPLETE event. If there was an error in recording the stream or the audio content is otherwise not available, the recognizer MUST return an empty Waveform-URI header field. The default value for this field is "false". This header field MAY occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS. See the discussion on the sensitivity of saved waveforms in Section 12.
此标头字段允许客户端请求识别器资源将音频输入保存到识别器。然后,识别器资源必须尝试录制识别出的音频,而不进行终结,并以在识别完成事件的波形URI标头字段中返回的URI的形式将其提供给客户端。如果录制流时出错或音频内容不可用,则识别器必须返回空的波形URI头字段。此字段的默认值为“false”。此标题字段可能出现在RECOGNIZE、SET-PARAMS或GET-PARAMS中。参见第12节中关于保存波形灵敏度的讨论。
save-waveform = "Save-Waveform" ":" BOOLEAN CRLF
保存波形=“保存波形”“:”布尔CRLF
This header field MAY be specified in a RECOGNIZE request and allows the client to tell the server that, from this point on, further input audio comes from a different audio source, channel, or speaker. If the recognizer resource had collected any input statistics or adaptation state, the recognizer resource MUST do what is appropriate for the specific recognition technology, which includes but is not limited to discarding any collected input statistics or adaptation state before starting the RECOGNIZE request. Note that if there are
该报头字段可以在识别请求中指定,并允许客户端告知服务器,从现在开始,进一步的输入音频来自不同的音频源、频道或扬声器。如果识别器资源已收集任何输入统计信息或自适应状态,则识别器资源必须执行适合特定识别技术的操作,包括但不限于在启动识别请求之前丢弃任何收集的输入统计信息或自适应状态。注意,如果有
multiple resources that are sharing a media stream and are collecting or using this data, and the client issues this header field to one of the resources, the reset operation applies to all resources that use the shared media stream. This helps in a number of use cases, including where the client wishes to reuse an open recognition session with an existing media session for multiple telephone calls.
多个资源共享一个媒体流并正在收集或使用此数据,并且客户端向其中一个资源发出此标头字段,则重置操作适用于使用共享媒体流的所有资源。这在许多用例中都有帮助,包括客户希望将开放式识别会话与现有媒体会话重新用于多个电话呼叫的情况。
new-audio-channel = "New-Audio-Channel" ":" BOOLEAN CRLF
新建音频频道=“新建音频频道”“:”布尔CRLF
This header field specifies the language of recognition grammar data within a session or request, if it is not specified within the data. The value of this header field MUST follow RFC 5646 [RFC5646] for its values. This MAY occur in DEFINE-GRAMMAR, RECOGNIZE, SET-PARAMS, or GET-PARAMS requests.
此标头字段指定会话或请求中识别语法数据的语言(如果未在数据中指定)。此标头字段的值必须在RFC 5646[RFC5646]之后。这可能发生在DEFINE-GRAMMAR、RECOGNIZE、SET-PARAMS或GET-PARAMS请求中。
speech-language = "Speech-Language" ":" 1*VCHAR CRLF
speech-language = "Speech-Language" ":" 1*VCHAR CRLF
This header field lets the client request the server to buffer the utterance associated with this recognition request into a buffer available to a co-resident verifier resource. The buffer is shared across resources within a session and is allocated when a verifier resource is added to this session. The client MUST NOT send this header field unless a verifier resource is instantiated for the session. The buffer is released when the verifier resource is released from the session.
此标头字段允许客户端请求服务器将与此识别请求相关联的话语缓冲到可供共同驻留验证器资源使用的缓冲区中。缓冲区在会话中跨资源共享,并在将验证器资源添加到此会话时分配。除非为会话实例化了验证器资源,否则客户端不得发送此标头字段。当从会话中释放验证器资源时,将释放缓冲区。
This header field specifies what mode the RECOGNIZE method will operate in. The value choices are "normal" or "hotword". If the value is "normal", the RECOGNIZE starts matching speech and DTMF to the grammars specified in the RECOGNIZE request. If any portion of the speech does not match the grammar, the RECOGNIZE command completes with a no-match status. Timers may be active to detect speech in the audio (see Section 9.4.14), so the RECOGNIZE method may complete because of a timeout waiting for speech. If the value of this header field is "hotword", the RECOGNIZE method operates in hotword mode, where it only looks for the particular keywords or DTMF
此标题字段指定识别方法将在何种模式下操作。值的选择是“正常”或“热词”。如果该值为“正常”,则识别将开始将语音和DTMF与识别请求中指定的语法进行匹配。如果语音的任何部分与语法不匹配,则RECOGNIZE命令将以不匹配状态完成。定时器可能会激活以检测音频中的语音(见第9.4.14节),因此识别方法可能会因等待语音超时而完成。如果此标题字段的值为“hotword”,则识别方法将在hotword模式下运行,在该模式下,它只查找特定的关键字或DTMF
sequences specified in the grammar and ignores silence or other speech in the audio stream. The default value for this header field is "normal". This header field MAY occur on the RECOGNIZE method.
语法中指定的序列,忽略音频流中的静音或其他语音。此标题字段的默认值为“正常”。此标题字段可能出现在识别方法上。
recognition-mode = "Recognition-Mode" ":" "normal" / "hotword" CRLF
recognition-mode = "Recognition-Mode" ":" "normal" / "hotword" CRLF
This header field specifies what will happen if the client attempts to invoke another RECOGNIZE method when this RECOGNIZE request is already in progress for the resource. The value for this header field is a Boolean. A value of "true" means the server MUST terminate this RECOGNIZE request, with a Completion-Cause of "cancelled", if the client issues another RECOGNIZE request for the same resource. A value of "false" for this header field indicates to the server that this RECOGNIZE request will continue to completion, and if the client issues more RECOGNIZE requests to the same resource, they are queued. When the currently active RECOGNIZE request is stopped or completes with a successful match, the first RECOGNIZE method in the queue becomes active. If the current RECOGNIZE fails, all RECOGNIZE methods in the pending queue are cancelled, and each generates a RECOGNITION-COMPLETE event with a Completion-Cause of "cancelled". This header field MUST be present in every RECOGNIZE request. There is no default value.
此标头字段指定当资源的此识别请求已在进行时,如果客户端尝试调用另一个识别方法,将发生什么。此标题字段的值为布尔值。值“true”表示如果客户端对同一资源发出另一个识别请求,服务器必须终止此识别请求,完成原因为“取消”。此标头字段的值为“false”向服务器指示此识别请求将继续完成,如果客户端向同一资源发出更多识别请求,则它们将排队。当当前活动的识别请求停止或匹配成功完成时,队列中的第一个识别方法将变为活动。如果当前识别失败,将取消挂起队列中的所有识别方法,每个方法将生成一个识别完成事件,完成原因为“取消”。此标头字段必须出现在每个请求中。没有默认值。
cancel-if-queue = "Cancel-If-Queue" ":" BOOLEAN CRLF
取消if queue=“取消if queue”“:”布尔CRLF
This header field MAY be sent in a hotword mode RECOGNIZE request. It specifies the maximum length of an utterance (in seconds) that will be considered for hotword recognition. This header field, along with Hotword-Min-Duration, can be used to tune performance by preventing the recognizer from evaluating utterances that are too short or too long to be one of the hotwords in the grammar(s). The value is in milliseconds. The default is implementation dependent. If present in a RECOGNIZE request specifying a mode other than "hotword", the header field is ignored.
此标头字段可在hotword模式下发送识别请求。它指定了热词识别将考虑的话语的最大长度(以秒为单位)。此标题字段以及Hotword Min Duration可用于通过防止识别器评估太短或太长而不能作为语法中的Hotword之一的语句来调整性能。该值以毫秒为单位。默认值取决于实现。如果存在于指定“hotword”以外模式的识别请求中,则忽略标题字段。
hotword-max-duration = "Hotword-Max-Duration" ":" 1*19DIGIT CRLF
hotword max duration=“hotword max duration”“:“1*19位CRLF
This header field MAY be sent in a hotword mode RECOGNIZE request. It specifies the minimum length of an utterance (in seconds) that will be considered for hotword recognition. This header field, along
此标头字段可在hotword模式下发送识别请求。它指定了热词识别将考虑的最短话语长度(以秒为单位)。此标题字段,沿
with Hotword-Max-Duration, can be used to tune performance by preventing the recognizer from evaluating utterances that are too short or too long to be one of the hotwords in the grammar(s). The value is in milliseconds. The default value is implementation dependent. If present in a RECOGNIZE request specifying a mode other than "hotword", the header field is ignored.
使用Hotword Max Duration,可以通过防止识别器评估太短或太长而不能成为语法中热门词之一的语句来调整性能。该值以毫秒为单位。默认值取决于实现。如果存在于指定“hotword”以外模式的识别请求中,则忽略标题字段。
hotword-min-duration = "Hotword-Min-Duration" ":" 1*19DIGIT CRLF
hotword-min-duration = "Hotword-Min-Duration" ":" 1*19DIGIT CRLF
The value of this header field is used to provide a pointer to the text for which a natural language interpretation is desired. The value is either a URI or text. If the value is a URI, it MUST be a Content-ID that refers to an entity of type 'text/plain' in the body of the message. Otherwise, the server MUST treat the value as the text to be interpreted. This header field MUST be used when invoking the INTERPRET method.
此标题字段的值用于提供指向需要自然语言解释的文本的指针。该值是URI或文本。如果该值是URI,则它必须是引用消息体中“text/plain”类型实体的内容ID。否则,服务器必须将该值视为要解释的文本。调用解释方法时必须使用此标头字段。
interpret-text = "Interpret-Text" ":" 1*VCHAR CRLF
interpret-text = "Interpret-Text" ":" 1*VCHAR CRLF
This header field MAY be specified in a GET-PARAMS or SET-PARAMS method and is used to specify the amount of time, in milliseconds, of the type-ahead buffer for the recognizer. This is the buffer that collects DTMF digits as they are pressed even when there is no RECOGNIZE command active. When a subsequent RECOGNIZE method is received, it MUST look to this buffer to match the RECOGNIZE request. If the digits in the buffer are not sufficient, then it can continue to listen to more digits to match the grammar. The default size of this DTMF buffer is platform specific.
此标头字段可以在GET-PARAMS或SET-PARAMS方法中指定,并用于指定识别器的提前类型缓冲区的时间量(以毫秒为单位)。这是一个缓冲区,当DTMF数字被按下时,即使没有激活的识别命令,它也会收集这些数字。当接收到后续识别方法时,它必须查看该缓冲区以匹配识别请求。如果缓冲区中的数字不够,则它可以继续侦听更多数字以匹配语法。此DTMF缓冲区的默认大小是特定于平台的。
dtmf-buffer-time = "DTMF-Buffer-Time" ":" 1*19DIGIT CRLF
dtmf-buffer-time = "DTMF-Buffer-Time" ":" 1*19DIGIT CRLF
This header field MAY be specified in a RECOGNIZE method and is used to tell the recognizer to clear the DTMF type-ahead buffer before starting the RECOGNIZE. The default value of this header field is "false", which does not clear the type-ahead buffer before starting the RECOGNIZE method. If this header field is specified to be "true", then the RECOGNIZE will clear the DTMF buffer before starting recognition. This means digits pressed by the caller before the RECOGNIZE command was issued are discarded.
此标头字段可在识别方法中指定,用于告知识别器在启动识别之前清除DTMF类型前置缓冲区。此标头字段的默认值为“false”,这在启动识别方法之前不会清除类型先行缓冲区。如果此标头字段指定为“true”,则识别将在开始识别之前清除DTMF缓冲区。这意味着在发出识别命令之前,调用者按下的数字将被丢弃。
clear-dtmf-buffer = "Clear-DTMF-Buffer" ":" BOOLEAN CRLF
clear dtmf buffer=“clear dtmf buffer”“:”布尔CRLF
This header field MAY be specified in a RECOGNIZE method and is used to tell the recognizer that it MUST NOT wait for the end of speech before processing the collected speech to match active grammars. A value of "true" indicates the recognizer MUST do early matching. The default value for this header field if not specified is "false". If the recognizer does not support the processing of the collected audio before the end of speech, this header field can be safely ignored.
此头字段可在识别方法中指定,并用于告知识别器在处理收集的语音以匹配活动语法之前不得等待语音结束。值“true”表示识别器必须进行早期匹配。如果未指定,则此标题字段的默认值为“false”。如果识别器不支持在语音结束前处理收集的音频,则可以安全地忽略此标题字段。
early-no-match = "Early-No-Match" ":" BOOLEAN CRLF
早期不匹配=“早期不匹配”“:”布尔CRLF
This header field MAY be specified in a START-PHRASE-ENROLLMENT, SET-PARAMS, or GET-PARAMS method and is used to specify the minimum number of consistent pronunciations that must be obtained to voice enroll a new phrase. The minimum value is 1. The default value is implementation specific and MAY be greater than 1.
此标题字段可以在START-Phase-ENROLLMENT、SET-PARAMS或GET-PARAMS方法中指定,并用于指定语音注册新短语必须获得的最小一致发音数。最小值为1。默认值是特定于实现的,可能大于1。
num-min-consistent-pronunciations = "Num-Min-Consistent-Pronunciations" ":" 1*19DIGIT CRLF
num min一致发音=“num min一致发音”:“1*19位CRLF
This header field MAY be sent as part of the START-PHRASE-ENROLLMENT, SET-PARAMS, or GET-PARAMS method. Used during voice enrollment, this header field specifies how similar to a previously enrolled pronunciation of the same phrase an utterance needs to be in order to be considered "consistent". The higher the threshold, the closer the match between an utterance and previous pronunciations must be for the pronunciation to be considered consistent. The range for this threshold is a float value between 0.0 and 1.0. The default value for this header field is implementation specific.
此标头字段可以作为START-Phase-ENROLLMENT、SET-PARAMS或GET-PARAMS方法的一部分发送。在语音注册过程中使用,此标题字段指定为了被视为“一致”,话语需要与先前注册的同一短语的发音有多相似。阈值越高,发音和先前发音之间的匹配就必须越接近,才能认为发音是一致的。此阈值的范围是介于0.0和1.0之间的浮点值。此标头字段的默认值是特定于实现的。
consistency-threshold = "Consistency-Threshold" ":" FLOAT CRLF
一致性阈值=“一致性阈值”:“浮动CRLF”
This header field MAY be sent as part of the START-PHRASE-ENROLLMENT, SET-PARAMS, or GET-PARAMS method. Used during voice enrollment, this header field specifies how similar the pronunciations of two different phrases can be before they are considered to be clashing. For example, pronunciations of phrases such as "John Smith" and "Jon Smits" may be so similar that they are difficult to distinguish correctly. A smaller threshold reduces the number of clashes detected. The range for this threshold is a float value between 0.0
此标头字段可以作为START-Phase-ENROLLMENT、SET-PARAMS或GET-PARAMS方法的一部分发送。在语音注册期间使用,此标题字段指定两个不同短语的发音在被视为冲突之前的相似程度。例如,“John Smith”和“Jon Smits”等短语的发音可能非常相似,难以正确区分。较小的阈值会减少检测到的碰撞数。此阈值的范围是介于0.0之间的浮点值
and 1.0. The default value for this header field is implementation specific. Clash testing can be turned off completely by setting the Clash-Threshold header field value to 0.
和1.0。此标头字段的默认值是特定于实现的。通过将“碰撞阈值”标题字段值设置为0,可以完全关闭碰撞测试。
clash-threshold = "Clash-Threshold" ":" FLOAT CRLF
clash threshold=“clash threshold”“:“浮动CRLF”
This header field specifies the speaker-trained grammar to be used or referenced during enrollment operations. Phrases are added to this grammar during enrollment. For example, a contact list for user "Jeff" could be stored at the Personal-Grammar-URI "http://myserver.example.com/myenrollmentdb/jeff-list". The generated grammar syntax MAY be implementation specific. There is no default value for this header field. This header field MAY be sent as part of the START-PHRASE-ENROLLMENT, SET-PARAMS, or GET-PARAMS method.
此标题字段指定注册操作期间要使用或引用的经过说话人培训的语法。在注册期间,短语会添加到此语法中。例如,用户“Jeff”的联系人列表可以存储在“个人语法URI”中http://myserver.example.com/myenrollmentdb/jeff-list". 生成的语法可能是特定于实现的。此标题字段没有默认值。此标头字段可以作为START-Phase-ENROLLMENT、SET-PARAMS或GET-PARAMS方法的一部分发送。
personal-grammar-uri = "Personal-Grammar-URI" ":" uri CRLF
个人语法uri=“个人语法uri”“:”uri CRLF
This header field MAY be specified in the RECOGNIZE method. If this header field is set to "true" and an Enrollment is active, the RECOGNIZE command MUST add the collected utterance to the personal grammar that is being enrolled. The way in which this occurs is engine specific and may be an area of future standardization. The default value for this header field is "false".
可以在识别方法中指定此标头字段。如果此标题字段设置为“true”,并且注册处于活动状态,则RECOGNIZE命令必须将收集的话语添加到正在注册的个人语法中。发生这种情况的方式是特定于发动机的,可能是未来标准化的一个领域。此标题字段的默认值为“false”。
enroll-utterance = "Enroll-Utterance" ":" BOOLEAN CRLF
enroll outrance=“enroll outrance”“:”布尔CRLF
This header field in a request identifies a phrase in an existing personal grammar for which enrollment is desired. It is also returned to the client in the RECOGNIZE complete event. This header field MAY occur in START-PHRASE-ENROLLMENT, MODIFY-PHRASE, or DELETE-PHRASE requests. There is no default value for this header field.
请求中的此标头字段标识现有个人语法中需要注册的短语。它还将在识别完成事件中返回给客户端。此标题字段可能出现在起始短语注册、修改短语或删除短语请求中。此标题字段没有默认值。
phrase-id = "Phrase-ID" ":" 1*VCHAR CRLF
phrase-id = "Phrase-ID" ":" 1*VCHAR CRLF
This string specifies the interpreted text to be returned when the phrase is recognized. This header field MAY occur in START-PHRASE-ENROLLMENT and MODIFY-PHRASE requests. There is no default value for this header field.
此字符串指定识别短语时返回的解释文本。此标题字段可能出现在开始阶段注册和修改阶段请求中。此标题字段没有默认值。
phrase-nl = "Phrase-NL" ":" 1*UTFCHAR CRLF
phrase-nl = "Phrase-NL" ":" 1*UTFCHAR CRLF
The value of this header field represents the occurrence likelihood of a phrase in an enrolled grammar. When using grammar enrollment, the system is essentially constructing a grammar segment consisting of a list of possible match phrases. This can be thought of to be similar to the dynamic construction of a <one-of> tag in the W3C grammar specification. Each enrolled-phrase becomes an item in the list that can be matched against spoken input similar to the <item> within a <one-of> list. This header field allows you to assign a weight to the phrase (i.e., <item> entry) in the <one-of> list that is enrolled. Grammar weights are normalized to a sum of one at grammar compilation time, so a weight value of 1 for each phrase in an enrolled grammar list indicates all items in that list have the same weight. This header field MAY occur in START-PHRASE-ENROLLMENT and MODIFY-PHRASE requests. The default value for this header field is implementation specific.
此标题字段的值表示已注册语法中短语出现的可能性。当使用语法注册时,系统实质上是构造一个语法段,由可能匹配的短语列表组成。这可以被认为类似于W3C语法规范中<one of>标记的动态构造。每个登记的短语成为列表中的一个项目,可以与语音输入相匹配,类似于<one of>列表中的<item>。此标题字段允许您为注册的<one of>列表中的短语(即<item>条目)分配权重。语法权重在语法编译时被标准化为1的总和,因此注册语法列表中每个短语的权重值为1表示该列表中的所有项具有相同的权重。此标题字段可能出现在开始阶段注册和修改阶段请求中。此标头字段的默认值是特定于实现的。
weight = "Weight" ":" FLOAT CRLF
weight=“weight”“:”浮动CRLF
This header field allows the client to request the recognizer resource to save the audio stream for the best repetition of the phrase that was used during the enrollment session. The recognizer MUST attempt to record the recognized audio and make it available to the client in the form of a URI returned in the Waveform-URI header field in the response to the END-PHRASE-ENROLLMENT method. If there was an error in recording the stream or the audio data is otherwise not available, the recognizer MUST return an empty Waveform-URI header field. This header field MAY occur in the START-PHRASE-ENROLLMENT, SET-PARAMS, and GET-PARAMS methods.
此标头字段允许客户端请求识别器资源保存音频流,以便在注册会话期间最佳重复使用短语。识别器必须尝试录制已识别的音频,并以波形URI头字段中返回的URI的形式将其提供给客户端,以响应结束语注册方法。如果记录流时出错或音频数据不可用,识别器必须返回空波形URI头字段。此标题字段可能出现在START-PHRASE-incrollment、SET-PARAMS和GET-PARAMS方法中。
save-best-waveform = "Save-Best-Waveform" ":" BOOLEAN CRLF
保存最佳波形=“保存最佳波形”:“布尔CRLF
This header field replaces the ID used to identify the phrase in a personal grammar. The recognizer returns the new ID when using an enrollment grammar. This header field MAY occur in MODIFY-PHRASE requests.
此标题字段替换用于在个人语法中标识短语的ID。识别器在使用注册语法时返回新ID。此标头字段可能出现在修改短语请求中。
new-phrase-id = "New-Phrase-ID" ":" 1*VCHAR CRLF
new-phrase-id = "New-Phrase-ID" ":" 1*VCHAR CRLF
This header field specifies a grammar that defines invalid phrases for enrollment. For example, typical applications do not allow an enrolled phrase that is also a command word. This header field MAY occur in RECOGNIZE requests that are part of an enrollment session.
此标题字段指定定义注册无效短语的语法。例如,典型应用程序不允许同时作为命令词的已注册短语。此标头字段可能出现在作为注册会话一部分的识别请求中。
confusable-phrases-uri = "Confusable-Phrases-URI" ":" uri CRLF
可混淆短语uri=“可混淆短语uri”“:”uri CRLF
This header field MAY be specified in the END-PHRASE-ENROLLMENT method to abort the phrase enrollment, rather than committing the phrase to the personal grammar.
可以在END-Phase-ENROLLMENT方法中指定此标题字段以中止短语注册,而不是将短语提交到个人语法。
abort-phrase-enrollment = "Abort-Phrase-Enrollment" ":" BOOLEAN CRLF
中止短语注册=“中止短语注册”“:”布尔CRLF
A recognizer message can carry additional data associated with the request, response, or event. The client MAY provide the grammar to be recognized in DEFINE-GRAMMAR or RECOGNIZE requests. When one or more grammars are specified using the DEFINE-GRAMMAR method, the server MUST attempt to fetch, compile, and optimize the grammar before returning a response to the DEFINE-GRAMMAR method. A RECOGNIZE request MUST completely specify the grammars to be active during the recognition operation, except when the RECOGNIZE method is being used to enroll a grammar. During grammar enrollment, such grammars are OPTIONAL. The server resource sends the recognition results in the RECOGNITION-COMPLETE event and the GET-RESULT response. Grammars and recognition results are carried in the message body of the corresponding MRCPv2 messages.
识别器消息可以携带与请求、响应或事件相关的附加数据。客户机可以提供在DEFINE-grammar或Recognite请求中要识别的语法。当使用DEFINE-GRAMMAR方法指定一个或多个语法时,服务器必须在返回对DEFINE-GRAMMAR方法的响应之前尝试获取、编译和优化语法。识别请求必须完全指定识别操作期间要激活的语法,除非使用识别方法注册语法。在语法注册期间,此类语法是可选的。服务器资源在recognition-COMPLETE事件和GET-RESULT响应中发送识别结果。语法和识别结果被携带在相应MRCPv2消息的消息体中。
Recognizer grammar data from the client to the server can be provided inline or by reference. Either way, grammar data is carried as typed media entities in the message body of the RECOGNIZE or DEFINE-GRAMMAR
从客户端到服务器的识别器语法数据可以内联提供,也可以通过引用提供。无论哪种方式,语法数据都作为类型化媒体实体携带在Recognite或DEFINE-grammar的消息体中
request. All MRCPv2 servers MUST accept grammars in the XML form (media type 'application/srgs+xml') of the W3C's XML-based Speech Grammar Markup Format (SRGS) [W3C.REC-speech-grammar-20040316] and MAY accept grammars in other formats. Examples include but are not limited to:
要求所有MRCPv2服务器必须接受W3C基于XML的语音语法标记格式(srgs)[W3C.REC-Speech-Grammar-20040316]的XML格式(媒体类型“application/srgs+XML”)的语法,并且可以接受其他格式的语法。示例包括但不限于:
o the ABNF form (media type 'application/srgs') of SRGS
o SRG的ABNF表格(媒体类型“应用程序/SRG”)
o Sun's Java Speech Grammar Format (JSGF) [refs.javaSpeechGrammarFormat]
o Sun的Java语音语法格式(JSGF)[refs.javaSpeechGrammarFormat]
Additionally, MRCPv2 servers MAY support the Semantic Interpretation for Speech Recognition (SISR) [W3C.REC-semantic-interpretation-20070405] specification.
此外,MRCPv2服务器可能支持语音识别的语义解释(SISR)[W3C.REC-Semantic-Expression-20070405]规范。
When a grammar is specified inline in the request, the client MUST provide a Content-ID for that grammar as part of the content header fields. If there is no space on the server to store the inline grammar, the request MUST return with a Completion-Cause code of 016 "grammar-definition-failure". Otherwise, the server MUST associate the inline grammar block with that Content-ID and MUST store it on the server for the duration of the session. However, if the Content-ID is redefined later in the session through a subsequent DEFINE-GRAMMAR, the inline grammar previously associated with the Content-ID MUST be freed. If the Content-ID is redefined through a subsequent DEFINE-GRAMMAR with an empty message body (i.e., no grammar definition), then in addition to freeing any grammar previously associated with the Content-ID, the server MUST clear all bindings and associations to the Content-ID. Unless and until subsequently redefined, this URI MUST be interpreted by the server as one that has never been set.
当在请求中内联指定语法时,客户端必须提供该语法的内容ID作为内容头字段的一部分。如果服务器上没有空间存储内联语法,则请求必须返回一个完成原因代码016“语法定义失败”。否则,服务器必须将内联语法块与该内容ID相关联,并且必须在会话期间将其存储在服务器上。但是,如果稍后在会话中通过后续的DEFINE-GRAMMAR重新定义了内容ID,则必须释放先前与内容ID关联的内联语法。如果内容ID是通过带有空消息体(即,没有语法定义)的后续DEFINE-GRAMORM重新定义的,则除了释放以前与内容ID关联的任何语法外,服务器还必须清除与内容ID的所有绑定和关联。除非随后重新定义,服务器必须将此URI解释为从未设置过的URI。
Grammars that have been associated with a Content-ID can be referenced through the 'session' URI scheme (see Section 13.6). For example: session:help@root-level.store
与内容ID关联的语法可以通过“会话”URI方案引用(参见第13.6节)。例如:会话:help@root-level.store
Grammar data MAY be specified using external URI references. To do so, the client uses a body of media type 'text/uri-list' (see RFC 2483 [RFC2483] ) to list the one or more URIs that point to the grammar data. The client can use a body of media type 'text/ grammar-ref-list' (see Section 13.5.1) if it wants to assign weights to the list of grammar URI. All MRCPv2 servers MUST support grammar access using the 'http' and 'https' URI schemes.
可以使用外部URI引用指定语法数据。为此,客户端使用媒体类型“text/uri list”(参见RFC 2483[RFC2483])的主体来列出指向语法数据的一个或多个uri。如果客户端希望为语法URI列表分配权重,则可以使用媒体类型“文本/语法参考列表”(见第13.5.1节)的主体。所有MRCPv2服务器必须支持使用“http”和“https”URI方案的语法访问。
If the grammar data the client wishes to be used on a request consists of a mix of URI and inline grammar data, the client uses the 'multipart/mixed' media type to enclose the 'text/uri-list',
如果客户端希望在请求中使用的语法数据由URI和内联语法数据的混合组成,则客户端使用“多部分/混合”媒体类型将“文本/URI列表”括起来,
'application/srgs', or 'application/srgs+xml' content entities. The character set and encoding used in the grammar data are specified using to standard media type definitions.
“应用程序/srgs”或“应用程序/srgs+xml”内容实体。语法数据中使用的字符集和编码是使用标准媒体类型定义指定的。
When more than one grammar URI or inline grammar block is specified in a message body of the RECOGNIZE request, the server interprets this as a list of grammar alternatives to match against.
当在RECOGNIZE请求的消息体中指定了多个语法URI或内联语法块时,服务器会将其解释为要匹配的语法替代列表。
Content-Type:application/srgs+xml Content-ID:<request1@form-level.store> Content-Length:...
内容类型:应用程序/srgs+xml内容ID:<request1@form-level.store>内容长度:。。。
<?xml version="1.0"?>
<?xml version="1.0"?>
<!-- the default grammar language is US English --> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" root="request">
<!-- the default grammar language is US English --> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" root="request">
<!-- single language attachment to tokens --> <rule id="yes"> <one-of> <item xml:lang="fr-CA">oui</item> <item xml:lang="en-US">yes</item> </one-of> </rule>
<!-- single language attachment to tokens --> <rule id="yes"> <one-of> <item xml:lang="fr-CA">oui</item> <item xml:lang="en-US">yes</item> </one-of> </rule>
<!-- single language attachment to a rule expansion --> <rule id="request"> may I speak to <one-of xml:lang="fr-CA"> <item>Michel Tremblay</item> <item>Andre Roy</item> </one-of> </rule>
<!-- single language attachment to a rule expansion --> <rule id="request"> may I speak to <one-of xml:lang="fr-CA"> <item>Michel Tremblay</item> <item>Andre Roy</item> </one-of> </rule>
<!-- multiple language attachment to a token --> <rule id="people1"> <token lexicon="en-US,fr-CA"> Robert </token> </rule>
<!-- multiple language attachment to a token --> <rule id="people1"> <token lexicon="en-US,fr-CA"> Robert </token> </rule>
<!-- the equivalent single-language attachment expansion --> <rule id="people2"> <one-of> <item xml:lang="en-US">Robert</item> <item xml:lang="fr-CA">Robert</item> </one-of> </rule>
<!-- the equivalent single-language attachment expansion --> <rule id="people2"> <one-of> <item xml:lang="en-US">Robert</item> <item xml:lang="fr-CA">Robert</item> </one-of> </rule>
</grammar>
</grammar>
SRGS Grammar Example
SRGS语法示例
Content-Type:text/uri-list Content-Length:...
内容类型:文本/uri列表内容长度:。。。
session:help@root-level.store http://www.example.com/Directory-Name-List.grxml http://www.example.com/Department-List.grxml http://www.example.com/TAC-Contact-List.grxml session:menu1@menu-level.store
session:help@root-level.store http://www.example.com/Directory-Name-List.grxml http://www.example.com/Department-List.grxml http://www.example.com/TAC-Contact-List.grxml session:menu1@menu-level.store
Grammar Reference Example
语法参考示例
Content-Type:multipart/mixed; boundary="break"
Content-Type:multipart/mixed; boundary="break"
--break Content-Type:text/uri-list Content-Length:...
--中断内容类型:文本/uri列表内容长度:。。。
http://www.example.com/Directory-Name-List.grxml http://www.example.com/Department-List.grxml http://www.example.com/TAC-Contact-List.grxml
http://www.example.com/Directory-Name-List.grxml http://www.example.com/Department-List.grxml http://www.example.com/TAC-Contact-List.grxml
--break Content-Type:application/srgs+xml Content-ID:<request1@form-level.store> Content-Length:...
--中断内容类型:应用程序/srgs+xml内容ID:<request1@form-level.store>内容长度:。。。
<?xml version="1.0"?>
<?xml version="1.0"?>
<!-- the default grammar language is US English --> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0">
<!-- the default grammar language is US English --> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0">
<!-- single language attachment to tokens --> <rule id="yes"> <one-of> <item xml:lang="fr-CA">oui</item> <item xml:lang="en-US">yes</item> </one-of> </rule>
<!-- single language attachment to tokens --> <rule id="yes"> <one-of> <item xml:lang="fr-CA">oui</item> <item xml:lang="en-US">yes</item> </one-of> </rule>
<!-- single language attachment to a rule expansion --> <rule id="request"> may I speak to <one-of xml:lang="fr-CA"> <item>Michel Tremblay</item> <item>Andre Roy</item> </one-of> </rule>
<!-- single language attachment to a rule expansion --> <rule id="request"> may I speak to <one-of xml:lang="fr-CA"> <item>Michel Tremblay</item> <item>Andre Roy</item> </one-of> </rule>
<!-- multiple language attachment to a token --> <rule id="people1"> <token lexicon="en-US,fr-CA"> Robert </token> </rule>
<!-- multiple language attachment to a token --> <rule id="people1"> <token lexicon="en-US,fr-CA"> Robert </token> </rule>
<!-- the equivalent single-language attachment expansion --> <rule id="people2"> <one-of> <item xml:lang="en-US">Robert</item> <item xml:lang="fr-CA">Robert</item> </one-of> </rule>
<!-- the equivalent single-language attachment expansion --> <rule id="people2"> <one-of> <item xml:lang="en-US">Robert</item> <item xml:lang="fr-CA">Robert</item> </one-of> </rule>
</grammar> --break--
</grammar>--中断--
Mixed Grammar Reference Example
混合语法参考示例
Recognition results are returned to the client in the message body of the RECOGNITION-COMPLETE event or the GET-RESULT response message as described in Section 6.3. Element and attribute descriptions for the recognition portion of the NLSML format are provided in Section 9.6 with a normative definition of the schema in Section 16.1.
如第6.3节所述,识别结果将在识别完成事件的消息体或GET-RESULT响应消息中返回给客户端。NLSML格式识别部分的元素和属性描述见第9.6节,模式的规范性定义见第16.1节。
Content-Type:application/nlsml+xml Content-Length:...
内容类型:应用程序/nlsml+xml内容长度:。。。
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:ex="http://www.example.com/example" grammar="http://www.example.com/theYesNoGrammar"> <interpretation> <instance> <ex:response>yes</ex:response> </instance> <input>OK</input> </interpretation> </result>
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:ex="http://www.example.com/example" grammar="http://www.example.com/theYesNoGrammar"> <interpretation> <instance> <ex:response>yes</ex:response> </instance> <input>OK</input> </interpretation> </result>
Result Example
结果示例
Enrollment results are returned to the client in the message body of the RECOGNITION-COMPLETE event as described in Section 6.3. Element and attribute descriptions for the enrollment portion of the NLSML format are provided in Section 9.7 with a normative definition of the schema in Section 16.2.
注册结果将在第6.3节所述的识别-完成事件的消息体中返回给客户端。NLSML格式注册部分的元素和属性描述见第9.7节,模式的规范性定义见第16.2节。
When a client changes servers while operating on the behalf of the same incoming communication session, this header field allows the client to collect a block of opaque data from one server and provide it to another server. This capability is desirable if the client needs different language support or because the server issued a redirect. Here, the first recognizer resource may have collected acoustic and other data during its execution of recognition methods. After a server switch, communicating this data may allow the recognizer resource on the new server to provide better recognition. This block of data is implementation specific and MUST be carried as media type 'application/octets' in the body of the message.
当客户端在代表同一传入通信会话操作时更改服务器时,此标头字段允许客户端从一台服务器收集不透明数据块,并将其提供给另一台服务器。如果客户端需要不同的语言支持,或者因为服务器发出了重定向,则需要此功能。这里,第一识别器资源可能在其执行识别方法期间收集了声学和其他数据。在服务器切换之后,通信此数据可能会允许新服务器上的识别器资源提供更好的识别。此数据块是特定于实现的,必须作为媒体类型“应用程序/八位字节”携带在消息体中。
This block of data is communicated in the SET-PARAMS and GET-PARAMS method/response messages. In the GET-PARAMS method, if an empty Recognizer-Context-Block header field is present, then the recognizer SHOULD return its vendor-specific context block, if any, in the message body as an entity of media type 'application/octets' with a specific Content-ID. The Content-ID value MUST also be specified in the Recognizer-Context-Block header field in the GET-PARAMS response. The SET-PARAMS request wishing to provide this vendor-specific data MUST send it in the message body as a typed entity with the same
该数据块在SET-PARAMS和GET-PARAMS方法/响应消息中进行通信。在GET-PARAMS方法中,如果存在空的识别器上下文块标题字段,则识别器应返回其特定于供应商的上下文块(如果有),在消息正文中,作为具有特定内容ID的媒体类型“应用程序/八位字节”的实体。还必须在GET-PARAMS响应的识别器上下文块标题字段中指定内容ID值。希望提供此供应商特定数据的SET-PARAMS请求必须在消息体中以类型化实体的形式发送该数据
Content-ID that it received from the GET-PARAMS. The Content-ID MUST also be sent in the Recognizer-Context-Block header field of the SET-PARAMS message.
从GET-PARAMS接收的内容ID。还必须在SET-PARAMS消息的识别器上下文块标题字段中发送内容ID。
Each speech recognition implementation choosing to use this mechanism to hand off recognizer context data among servers MUST distinguish its implementation-specific block of data from other implementations by choosing a Content-ID that is recognizable among the participating servers and unlikely to collide with values chosen by another implementation.
选择使用此机制在服务器之间传递识别器上下文数据的每个语音识别实现必须通过选择可在参与的服务器之间识别且不可能与另一个服务器选择的值冲突的内容ID来区分其特定于实现的数据块与其他实现实施
The recognizer portion of NLSML (see Section 6.3.1) represents information automatically extracted from a user's utterances by a semantic interpretation component, where "utterance" is to be taken in the general sense of a meaningful user input in any modality supported by the MRCPv2 implementation.
NLSML的识别器部分(见第6.3.1节)表示由语义解释组件自动从用户话语中提取的信息,其中“话语”是在MRCPv2实现支持的任何模态中,从有意义的用户输入的一般意义上获取的。
MRCPv2 recognizer resources employ the Natural Language Semantics Markup Language (NLSML) to interpret natural language speech input and to format the interpretation for consumption by an MRCPv2 client.
MRCPv2识别器资源使用自然语言语义标记语言(NLSML)来解释自然语言语音输入,并格式化解释以供MRCPv2客户端使用。
The elements of the markup fall into the following general functional categories: interpretation, side information, and multi-modal integration.
标记的元素分为以下一般功能类别:解释、辅助信息和多模式集成。
Elements and attributes represent the semantics of a user's utterance, including the <result>, <interpretation>, and <instance> elements. The <result> element contains the full result of processing one utterance. It MAY contain multiple <interpretation> elements if the interpretation of the utterance results in multiple alternative meanings due to uncertainty in speech recognition or natural language understanding. There are at least two reasons for providing multiple interpretations:
元素和属性表示用户话语的语义,包括<result>、<explation>和<instance>元素。<result>元素包含处理一个语句的完整结果。如果由于语音识别或自然语言理解的不确定性,对话语的解释会产生多个替代意义,那么它可能包含多个<explation>元素。提供多重解释至少有两个原因:
1. The client application might have additional information, for example, information from a database, that would allow it to select a preferred interpretation from among the possible interpretations returned from the semantic interpreter.
1. 客户端应用程序可能有其他信息,例如来自数据库的信息,这将允许它从语义解释器返回的可能解释中选择首选解释。
2. A client-based dialog manager (e.g., VoiceXML [W3C.REC-voicexml20-20040316]) that was unable to select between several competing interpretations could use this information to go back to the user and find out what was intended. For example, it could issue a SPEAK request to a synthesizer resource to emit "Did you say 'Boston' or 'Austin'?"
2. 基于客户端的对话框管理器(例如,VoiceXML[W3C.REC-voicexml20-20040316])如果无法在多个相互竞争的解释之间进行选择,则可以使用此信息返回给用户并找出其意图。例如,它可以向合成器资源发出讲话请求,以发出“您是说‘Boston’还是‘Austin’?”
These are elements and attributes representing additional information about the interpretation, over and above the interpretation itself. Side information includes:
这些元素和属性表示解释本身之外的有关解释的附加信息。旁白包括:
1. Whether an interpretation was achieved (the <nomatch> element) and the system's confidence in an interpretation (the "confidence" attribute of <interpretation>).
1. 是否实现了解释(元素<nomatch>)以及系统对解释的信心(属性<explation>的“信心”属性)。
2. Alternative interpretations (<interpretation>)
2. 替代解释(<expression>)
3. Input formats and Automatic Speech Recognition (ASR) information: the <input> element, representing the input to the semantic interpreter.
3. 输入格式和自动语音识别(ASR)信息:<Input>元素,表示语义解释器的输入。
When more than one modality is available for input, the interpretation of the inputs needs to be coordinated. The "mode" attribute of <input> supports this by indicating whether the utterance was input by speech, DTMF, pointing, etc. The "timestamp-start" and "timestamp-end" attributes of <input> also provide for temporal coordination by indicating when inputs occurred.
当有多个模态可供输入时,需要协调输入的解释。<input>的“mode”属性通过指示话语是否由语音、DTMF、指向等输入来支持这一点。<input>的“timestamp start”和“timestamp end”属性还通过指示输入发生的时间来提供时间协调。
The recognizer elements in NLSML fall into two categories:
NLSML中的识别器元素分为两类:
1. description of the input that was processed, and
1. 已处理输入的说明,以及
2. description of the meaning which was extracted from the input.
2. 对从输入中提取的含义的描述。
Next to each element are its attributes. In addition, some elements can contain multiple instances of other elements. For example, a <result> can contain multiple <interpretation> elements, each of which is taken to be an alternative. Similarly, <input> can contain multiple child <input> elements, which are taken to be cumulative. To illustrate the basic usage of these elements, as a simple example,
每个元素的旁边是它的属性。此外,某些元素可以包含其他元素的多个实例。例如,<result>可以包含多个<explation>元素,每个元素都被视为备选元素。类似地,<input>可以包含多个子<input>元素,这些元素被认为是累积的。为了说明这些元素的基本用法,作为一个简单的示例,
consider the utterance "OK" (interpreted as "yes"). The example illustrates how that utterance and its interpretation would be represented in the NLSML markup.
考虑“OK”的发音(解释为“是”)。该示例说明了如何在NLSML标记中表示该语句及其解释。
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:ex="http://www.example.com/example" grammar="http://www.example.com/theYesNoGrammar"> <interpretation> <instance> <ex:response>yes</ex:response> </instance> <input>OK</input> </interpretation> </result>
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:ex="http://www.example.com/example" grammar="http://www.example.com/theYesNoGrammar"> <interpretation> <instance> <ex:response>yes</ex:response> </instance> <input>OK</input> </interpretation> </result>
This example includes only the minimum required information. There is an overall <result> element, which includes one interpretation and an input element. The interpretation contains the application-specific element "<response>", which is the semantically interpreted result.
此示例仅包含所需的最低信息。有一个整体<result>元素,包括一个解释和一个输入元素。解释包含特定于应用程序的元素“<response>”,这是语义解释的结果。
The root element of the markup is <result>. The <result> element includes one or more <interpretation> elements. Multiple interpretations can result from ambiguities in the input or in the semantic interpretation. If the "grammar" attribute does not apply to all of the interpretations in the result, it can be overridden for individual interpretations at the <interpretation> level.
标记的根元素是<result>。<result>元素包括一个或多个<Expression>元素。输入或语义解释中的歧义可能导致多重解释。如果“grammar”属性不适用于结果中的所有解释,则可以在<explation>级别为单个解释覆盖该属性。
Attributes:
属性:
1. grammar: The grammar or recognition rule matched by this result. The format of the grammar attribute will match the rule reference semantics defined in the grammar specification. Specifically, the rule reference is in the external XML form for grammar rule references. The markup interpreter needs to know the grammar rule that is matched by the utterance because multiple rules may be simultaneously active. The value is the grammar URI used by the markup interpreter to specify the grammar. The grammar can be overridden by a grammar attribute in the <interpretation> element if the input was ambiguous as to which grammar it matched. If all interpretation elements within the result element contain their own grammar attributes, the attribute can be dropped from the result element.
1. 语法:与此结果匹配的语法或识别规则。语法属性的格式将匹配语法规范中定义的规则引用语义。具体来说,规则引用是语法规则引用的外部XML形式。标记解释器需要知道与语句匹配的语法规则,因为多个规则可能同时处于活动状态。该值是标记解释器用于指定语法的语法URI。如果输入与语法匹配的语法不明确,则<explation>元素中的语法属性可以覆盖语法。如果结果元素中的所有解释元素都包含它们自己的语法属性,则可以从结果元素中删除该属性。
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" grammar="http://www.example.com/grammar"> <interpretation> .... </interpretation> </result>
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" grammar="http://www.example.com/grammar"> <interpretation> .... </interpretation> </result>
An <interpretation> element contains a single semantic interpretation.
<explation>元素包含一个语义解释。
Attributes:
属性:
1. confidence: A float value from 0.0-1.0 indicating the semantic analyzer's confidence in this interpretation. A value of 1.0 indicates maximum confidence. The values are implementation dependent but are intended to align with the value interpretation for the confidence MRCPv2 header field defined in Section 9.4.1. This attribute is OPTIONAL.
1. 置信度:0.0-1.0之间的浮点值,表示语义分析器对此解释的置信度。值为1.0表示最大置信度。这些值取决于实现,但旨在与第9.4.1节中定义的置信度MRCPv2标题字段的值解释一致。此属性是可选的。
2. grammar: The grammar or recognition rule matched by this interpretation (if needed to override the grammar specification at the <interpretation> level.) This attribute is only needed under <interpretation> if it is necessary to override a grammar that was defined at the <result> level. Note that the grammar attribute for the interpretation element is optional if and only if the grammar attribute is specified in the <result> element.
2. 语法:与此解释匹配的语法或识别规则(如果需要在<解释>级别覆盖语法规范)。只有在<解释>级别需要覆盖在<结果>级别定义的语法时,才需要此属性。请注意,当且仅当在<result>元素中指定语法属性时,解释元素的语法属性是可选的。
Interpretations MUST be sorted best-first by some measure of "goodness". The goodness measure is "confidence" if present; otherwise, it is some implementation-specific indication of quality.
解释必须首先以某种“善”的尺度进行最佳排序。如果存在,善度度量是“信心”;否则,它是一些特定于实现的质量指示。
The grammar is expected to be specified most frequently at the <result> level. However, it can be overridden at the <interpretation> level because it is possible that different interpretations may match different grammar rules.
语法应该在<result>级别最频繁地指定。但是,它可以在<explation>级别被覆盖,因为不同的解释可能匹配不同的语法规则。
The <interpretation> element includes an optional <input> element containing the input being analyzed, and at least one <instance> element containing the interpretation of the utterance.
<expression>元素包括一个可选的<input>元素,其中包含所分析的输入,以及至少一个<instance>元素,其中包含对话语的解释。
<interpretation confidence="0.75" grammar="http://www.example.com/grammar"> ... </interpretation>
<interpretation confidence="0.75" grammar="http://www.example.com/grammar"> ... </interpretation>
The <instance> element contains the interpretation of the utterance. When the Semantic Interpretation for Speech Recognition format is used, the <instance> element contains the XML serialization of the result using the approach defined in that specification. When there is semantic markup in the grammar that does not create semantic objects, but instead only does a semantic translation of a portion of the input, such as translating "coke" to "coca-cola", the instance contains the whole input but with the translation applied. The NLSML looks like the markup in Figure 2 below. If there are no semantic objects created, nor any semantic translation, the instance value is the same as the input value.
<instance>元素包含对话语的解释。当使用语音识别格式的语义解释时,<instance>元素使用该规范中定义的方法包含结果的XML序列化。如果语法中存在语义标记,该语义标记不创建语义对象,而只对输入的一部分进行语义翻译,例如将“coke”翻译为“coca-cola”,则实例包含整个输入,但应用了翻译。NLSML看起来像下面图2中的标记。如果没有创建语义对象,也没有任何语义转换,则实例值与输入值相同。
Attributes:
属性:
1. confidence: Each element of the instance MAY have a confidence attribute, defined in the NLSML namespace. The confidence attribute contains a float value in the range from 0.0-1.0 reflecting the system's confidence in the analysis of that slot. A value of 1.0 indicates maximum confidence. The values are implementation dependent, but are intended to align with the value interpretation for the MRCPv2 header field Confidence-Threshold defined in Section 9.4.1. This attribute is OPTIONAL.
1. 置信度:实例的每个元素可能都有一个在NLSML命名空间中定义的置信度属性。“置信度”属性包含一个介于0.0-1.0之间的浮点值,该值反映了系统在该插槽分析中的置信度。值为1.0表示最大置信度。这些值取决于实现,但旨在与第9.4.1节中定义的MRCPv2标题字段置信阈值的值解释一致。此属性是可选的。
<instance> <nameAddress> <street confidence="0.75">123 Maple Street</street> <city>Mill Valley</city> <state>CA</state> <zip>90952</zip> </nameAddress> </instance> <input> My address is 123 Maple Street, Mill Valley, California, 90952 </input>
<instance> <nameAddress> <street confidence="0.75">123 Maple Street</street> <city>Mill Valley</city> <state>CA</state> <zip>90952</zip> </nameAddress> </instance> <input> My address is 123 Maple Street, Mill Valley, California, 90952 </input>
<instance> I would like to buy a coca-cola </instance> <input> I would like to buy a coke </input>
<instance> I would like to buy a coca-cola </instance> <input> I would like to buy a coke </input>
Figure 2: NSLML Example
图2:NSLML示例
The <input> element is the text representation of a user's input. It includes an optional "confidence" attribute, which indicates the recognizer's confidence in the recognition result (as opposed to the confidence in the interpretation, which is indicated by the "confidence" attribute of <interpretation>). Optional "timestamp-start" and "timestamp-end" attributes indicate the start and end times of a spoken utterance, in ISO 8601 format [ISO.8601.1988].
<input>元素是用户输入的文本表示形式。它包括一个可选的“置信度”属性,该属性表示识别器对识别结果的置信度(与解释的置信度相反,该置信度由<解释>的“置信度”属性表示)。可选的“timestamp start”和“timestamp end”属性以ISO 8601格式[ISO.8601.1988]表示口语的开始和结束时间。
Attributes:
属性:
1. timestamp-start: The time at which the input began. (optional)
1. 时间戳开始:输入开始的时间。(可选)
2. timestamp-end: The time at which the input ended. (optional)
2. 时间戳结束:输入结束的时间。(可选)
3. mode: The modality of the input, for example, speech, DTMF, etc. (optional)
3. 模式:输入的模态,例如语音、DTMF等(可选)
4. confidence: The confidence of the recognizer in the correctness of the input in the range 0.0 to 1.0. (optional)
4. 置信度:识别器对0.0到1.0范围内输入正确性的置信度。(可选)
Note that it may not make sense for temporally overlapping inputs to have the same mode; however, this constraint is not expected to be enforced by implementations.
注意,时间上重叠的输入具有相同的模式可能没有意义;但是,这种约束预计不会由实现强制实施。
When there is no time zone designator, ISO 8601 time representations default to local time.
当没有时区指示符时,ISO 8601时间表示默认为本地时间。
There are three possible formats for the <input> element.
<input>元素有三种可能的格式。
1. The <input> element can contain simple text:
1. <input>元素可以包含简单文本:
<input>onions</input>
<input>onions</input>
A future possibility is for <input> to contain not only text but additional markup that represents prosodic information that was contained in the original utterance and extracted by the speech recognizer. This depends on the availability of ASRs that are capable of producing prosodic information. MRCPv2 clients MUST be prepared to receive such markup and MAY make use of it.
未来的可能性是,<input>不仅包含文本,还包含表示原始话语中包含并由语音识别器提取的韵律信息的附加标记。这取决于能够产生韵律信息的ASR的可用性。MRCPv2客户必须做好接收此类标记的准备,并且可以利用这些标记。
2. An <input> tag can also contain additional <input> tags. Having additional input elements allows the representation to support future multi-modal inputs as well as finer-grained speech information, such as timestamps for individual words and word-level confidences.
2. <input>标记还可以包含附加的<input>标记。具有额外的输入元素允许表示支持未来的多模态输入以及更细粒度的语音信息,例如单个单词的时间戳和单词级信任。
<input> <input mode="speech" confidence="0.5" timestamp-start="2000-04-03T0:00:00" timestamp-end="2000-04-03T0:00:00.2">fried</input> <input mode="speech" confidence="1.0" timestamp-start="2000-04-03T0:00:00.25" timestamp-end="2000-04-03T0:00:00.6">onions</input> </input>
<input> <input mode="speech" confidence="0.5" timestamp-start="2000-04-03T0:00:00" timestamp-end="2000-04-03T0:00:00.2">fried</input> <input mode="speech" confidence="1.0" timestamp-start="2000-04-03T0:00:00.25" timestamp-end="2000-04-03T0:00:00.6">onions</input> </input>
3. Finally, the <input> element can contain <nomatch> and <noinput> elements, which describe situations in which the speech recognizer received input that it was unable to process or did not receive any input at all, respectively.
3. 最后,<input>元素可以包含<nomatch>和<noinput>元素,分别描述语音识别器接收到无法处理或根本没有接收到任何输入的情况。
The <nomatch> element under <input> is used to indicate that the semantic interpreter was unable to successfully match any input with confidence above the threshold. It can optionally contain the text of the best of the (rejected) matches.
<input>下的<nomatch>元素用于指示语义解释器无法成功匹配任何置信度高于阈值的输入。它可以选择性地包含(拒绝的)最佳匹配的文本。
<interpretation> <instance/> <input confidence="0.1"> <nomatch/> </input> </interpretation> <interpretation> <instance/> <input mode="speech" confidence="0.1"> <nomatch>I want to go to New York</nomatch> </input> </interpretation>
<interpretation> <instance/> <input confidence="0.1"> <nomatch/> </input> </interpretation> <interpretation> <instance/> <input mode="speech" confidence="0.1"> <nomatch>I want to go to New York</nomatch> </input> </interpretation>
<noinput> indicates that there was no input -- a timeout occurred in the speech recognizer due to silence. <interpretation> <instance/> <input> <noinput/> </input> </interpretation>
<noinput> indicates that there was no input -- a timeout occurred in the speech recognizer due to silence. <interpretation> <instance/> <input> <noinput/> </input> </interpretation>
If there are multiple levels of inputs, the most natural place for <nomatch> and <noinput> elements to appear is under the highest level of <input> for <noinput>, and under the appropriate level of
If there are multiple levels of inputs, the most natural place for <nomatch> and <noinput> elements to appear is under the highest level of <input> for <noinput>, and under the appropriate level of
<interpretation> for <nomatch>. So, <noinput> means "no input at all" and <nomatch> means "no match in speech modality" or "no match in DTMF modality". For example, to represent garbled speech combined with DTMF "1 2 3 4", the markup would be: <input> <input mode="speech"><nomatch/></input> <input mode="dtmf">1 2 3 4</input> </input>
<interpretation> for <nomatch>. So, <noinput> means "no input at all" and <nomatch> means "no match in speech modality" or "no match in DTMF modality". For example, to represent garbled speech combined with DTMF "1 2 3 4", the markup would be: <input> <input mode="speech"><nomatch/></input> <input mode="dtmf">1 2 3 4</input> </input>
Note: while <noinput> could be represented as an attribute of input, <nomatch> cannot, since it could potentially include PCDATA content with the best match. For parallelism, <noinput> is also an element.
注意:<noinput>可以表示为输入的属性,<nomatch>不能,因为它可能包含具有最佳匹配的PCDATA内容。对于并行性,<noinput>也是一个元素。
All enrollment elements are contained within a single <enrollment-result> element under <result>. The elements are described below and have the schema defined in Section 16.2. The following elements are defined:
所有注册元素都包含在<result>下的单个<enrollment result>元素中。这些元素如下所述,并具有第16.2节中定义的模式。定义了以下元素:
1. num-clashes
1. num冲突
2. num-good-repetitions
2. 好的重复次数
3. num-repetitions-still-needed
3. 仍然需要重复num次
4. consistency-status
4. 一致性状态
5. clash-phrase-ids
5. 冲突短语ID
6. transcriptions
6. 抄本
7. confusable-phrases
7. 易混淆的短语
The <num-clashes> element contains the number of clashes that this pronunciation has with other pronunciations in an active enrollment session. The associated Clash-Threshold header field determines the sensitivity of the clash measurement. Note that clash testing can be turned off completely by setting the Clash-Threshold header field value to 0.
元素包含此发音与活动注册会话中其他发音的冲突数。关联的碰撞阈值标题字段确定碰撞测量的灵敏度。请注意,通过将“碰撞阈值”标题字段值设置为0,可以完全关闭碰撞测试。
The <num-good-repetitions> element contains the number of consistent pronunciations obtained so far in an active enrollment session.
<num good repetitions>元素包含到目前为止在活动注册会话中获得的一致发音的数量。
The <num-repetitions-still-needed> element contains the number of consistent pronunciations that must still be obtained before the new phrase can be added to the enrollment grammar. The number of consistent pronunciations required is specified by the client in the request header field Num-Min-Consistent-Pronunciations. The returned value must be 0 before the client can successfully commit a phrase to the grammar by ending the enrollment session.
<num repetitions still needed>元素包含在将新短语添加到注册语法之前必须获得的一致发音数。所需的一致发音数由客户端在请求标头字段Num Min consistent pronances中指定。在客户端通过结束注册会话将短语成功提交到语法之前,返回值必须为0。
The <consistency-status> element is used to indicate how consistent the repetitions are when learning a new phrase. It can have the values of consistent, inconsistent, and undecided.
<consistency status>元素用于指示学习新短语时重复的一致性。它可以具有一致、不一致和未确定的值。
The <clash-phrase-ids> element contains the phrase IDs of clashing pronunciation(s), if any. This element is absent if there are no clashes.
元素包含冲突发音的短语ID(如果有)。如果没有碰撞,则不存在此图元。
The <transcriptions> element contains the transcriptions returned in the last repetition of the phrase being enrolled.
元素包含在被注册短语的最后一次重复中返回的转录。
The <confusable-phrases> element contains a list of phrases from a command grammar that are confusable with the phrase being added to the personal grammar. This element MAY be absent if there are no confusable phrases.
元素包含命令语法中与添加到个人语法中的短语可混淆的短语列表。如果没有易混淆的短语,则可能不存在此元素。
The DEFINE-GRAMMAR method, from the client to the server, provides one or more grammars and requests the server to access, fetch, and compile the grammars as needed. The DEFINE-GRAMMAR method implementation MUST do a fetch of all external URIs that are part of that operation. If caching is implemented, this URI fetching MUST conform to the cache control hints and parameter header fields associated with the method in deciding whether the URIs should be fetched from cache or from the external server. If these hints/ parameters are not specified in the method, the values set for the session using SET-PARAMS/GET-PARAMS apply. If it was not set for the session, their default values apply.
从客户端到服务器的DEFINE-GRAMMAR方法提供一个或多个语法,并根据需要请求服务器访问、获取和编译语法。DEFINE-GRAMMAR方法实现必须获取作为该操作一部分的所有外部URI。如果实现了缓存,那么在决定是从缓存还是从外部服务器获取URI时,此URI获取必须符合与方法关联的缓存控制提示和参数头字段。如果在方法中未指定这些提示/参数,则将应用使用set-PARAMS/GET-PARAMS为会话设置的值。如果未为会话设置,则应用其默认值。
If the server resource is in the recognition state, the DEFINE-GRAMMAR request MUST respond with a failure status.
如果服务器资源处于识别状态,DEFINE-GRAMMAR请求必须以失败状态响应。
If the resource is in the idle state and is able to successfully process the supplied grammars, the server MUST return a success code status and the request-state MUST be COMPLETE.
如果资源处于空闲状态并且能够成功处理提供的语法,则服务器必须返回成功代码状态,并且请求状态必须为“完成”。
If the recognizer resource could not define the grammar for some reason (for example, if the download failed, the grammar failed to compile, or the grammar was in an unsupported form), the MRCPv2 response for the DEFINE-GRAMMAR method MUST contain a failure status-code of 407 and contain a Completion-Cause header field describing the failure reason.
如果识别器资源由于某种原因无法定义语法(例如,如果下载失败、语法无法编译或语法的形式不受支持),DEFINE-GRAMMAR方法的MRCPv2响应必须包含故障状态代码407,并包含描述故障原因的完成原因标头字段。
C->S:MRCP/2.0 ... DEFINE-GRAMMAR 543257 Channel-Identifier:32AECB23433801@speechrecog Content-Type:application/srgs+xml Content-ID:<request1@form-level.store> Content-Length:...
C->S:MRCP/2.0。。。DEFINE-GRAMMAR 543257通道标识符:32AECB23433801@speechrecog内容类型:应用程序/srgs+xml内容ID:<request1@form-level.store>内容长度:。。。
<?xml version="1.0"?>
<?xml version="1.0"?>
<!-- the default grammar language is US English --> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0">
<!-- the default grammar language is US English --> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0">
<!-- single language attachment to tokens --> <rule id="yes"> <one-of> <item xml:lang="fr-CA">oui</item> <item xml:lang="en-US">yes</item> </one-of> </rule>
<!-- single language attachment to tokens --> <rule id="yes"> <one-of> <item xml:lang="fr-CA">oui</item> <item xml:lang="en-US">yes</item> </one-of> </rule>
<!-- single language attachment to a rule expansion --> <rule id="request"> may I speak to <one-of xml:lang="fr-CA"> <item>Michel Tremblay</item> <item>Andre Roy</item> </one-of> </rule>
<!-- single language attachment to a rule expansion --> <rule id="request"> may I speak to <one-of xml:lang="fr-CA"> <item>Michel Tremblay</item> <item>Andre Roy</item> </one-of> </rule>
</grammar>
</grammar>
S->C:MRCP/2.0 ... 543257 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog Completion-Cause:000 success
S->C:MRCP/2.0 ... 543257 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog Completion-Cause:000 success
C->S:MRCP/2.0 ... DEFINE-GRAMMAR 543258 Channel-Identifier:32AECB23433801@speechrecog Content-Type:application/srgs+xml Content-ID:<helpgrammar@root-level.store> Content-Length:...
C->S:MRCP/2.0。。。DEFINE-GRAMMAR 543258信道标识符:32AECB23433801@speechrecog内容类型:应用程序/srgs+xml内容ID:<helpgrammar@root-level.store>内容长度:。。。
<?xml version="1.0"?>
<?xml version="1.0"?>
<!-- the default grammar language is US English --> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0">
<!-- the default grammar language is US English --> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0">
<rule id="request"> I need help </rule>
<rule id="request"> I need help </rule>
S->C:MRCP/2.0 ... 543258 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog Completion-Cause:000 success
S->C:MRCP/2.0 ... 543258 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog Completion-Cause:000 success
C->S:MRCP/2.0 ... DEFINE-GRAMMAR 543259 Channel-Identifier:32AECB23433801@speechrecog Content-Type:application/srgs+xml Content-ID:<request2@field-level.store> Content-Length:...
C->S:MRCP/2.0。。。DEFINE-GRAMMAR 543259信道标识符:32AECB23433801@speechrecog内容类型:应用程序/srgs+xml内容ID:<request2@field-level.store>内容长度:。。。
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN" "http://www.w3.org/TR/speech-grammar/grammar.dtd">
<!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN" "http://www.w3.org/TR/speech-grammar/grammar.dtd">
<grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/06/grammar http://www.w3.org/TR/speech-grammar/grammar.xsd" version="1.0" mode="voice" root="basicCmd">
<grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/06/grammar http://www.w3.org/TR/speech-grammar/grammar.xsd" version="1.0" mode="voice" root="basicCmd">
<meta name="author" content="Stephanie Williams"/>
<meta name="author" content="Stephanie Williams"/>
<rule id="basicCmd" scope="public"> <example> please move the window </example> <example> open a file </example>
<rule id="basicCmd" scope="public"> <example> please move the window </example> <example> open a file </example>
<ruleref uri="http://grammar.example.com/politeness.grxml#startPolite"/>
<ruleref uri="http://grammar.example.com/politeness.grxml#startPolite"/>
<ruleref uri="#command"/> <ruleref uri="http://grammar.example.com/politeness.grxml#endPolite"/> </rule>
<ruleref uri="#command"/> <ruleref uri="http://grammar.example.com/politeness.grxml#endPolite"/> </rule>
<rule id="command"> <ruleref uri="#action"/> <ruleref uri="#object"/> </rule>
<rule id="command"> <ruleref uri="#action"/> <ruleref uri="#object"/> </rule>
<rule id="action"> <one-of> <item weight="10"> open <tag>open</tag> </item> <item weight="2"> close <tag>close</tag> </item> <item weight="1"> delete <tag>delete</tag> </item> <item weight="1"> move <tag>move</tag> </item> </one-of> </rule>
<rule id="action"> <one-of> <item weight="10"> open <tag>open</tag> </item> <item weight="2"> close <tag>close</tag> </item> <item weight="1"> delete <tag>delete</tag> </item> <item weight="1"> move <tag>move</tag> </item> </one-of> </rule>
<rule id="object"> <item repeat="0-1"> <one-of> <item> the </item> <item> a </item> </one-of> </item>
<rule id="object"> <item repeat="0-1"> <one-of> <item> the </item> <item> a </item> </one-of> </item>
<one-of> <item> window </item> <item> file </item> <item> menu </item> </one-of> </rule>
<one-of> <item> window </item> <item> file </item> <item> menu </item> </one-of> </rule>
</grammar>
</grammar>
S->C:MRCP/2.0 ... 543259 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog Completion-Cause:000 success
S->C:MRCP/2.0 ... 543259 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog Completion-Cause:000 success
C->S:MRCP/2.0 ... RECOGNIZE 543260 Channel-Identifier:32AECB23433801@speechrecog N-Best-List-Length:2 Content-Type:text/uri-list Content-Length:...
C->S:MRCP/2.0。。。识别543260通道标识符:32AECB23433801@speechrecogN-Best-List-Length:2内容类型:text/uri列表内容长度:。。。
session:request1@form-level.store session:request2@field-level.store session:helpgramar@root-level.store
session:request1@form-level.store session:request2@field-level.store session:helpgramar@root-level.store
S->C:MRCP/2.0 ... 543260 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speechrecog
S->C:MRCP/2.0 ... 543260 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speechrecog
S->C:MRCP/2.0 ... START-OF-INPUT 543260 IN-PROGRESS Channel-Identifier:32AECB23433801@speechrecog
S->C:MRCP/2.0 ... START-OF-INPUT 543260 IN-PROGRESS Channel-Identifier:32AECB23433801@speechrecog
S->C:MRCP/2.0 ... RECOGNITION-COMPLETE 543260 COMPLETE Channel-Identifier:32AECB23433801@speechrecog Completion-Cause:000 success Waveform-URI:<http://web.media.com/session123/audio.wav>; size=124535;duration=2340 Content-Type:application/x-nlsml Content-Length:...
S->C:MRCP/2.0。。。识别-完整543260完整通道标识符:32AECB23433801@speechrecog完成原因:000成功波形URI:<http://web.media.com/session123/audio.wav>; 尺寸=124535;持续时间=2340内容类型:应用程序/x-nlsml内容长度:。。。
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:ex="http://www.example.com/example" grammar="session:request1@form-level.store"> <interpretation> <instance name="Person"> <ex:Person> <ex:Name> Andre Roy </ex:Name> </ex:Person> </instance> <input> may I speak to Andre Roy </input> </interpretation> </result>
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:ex="http://www.example.com/example" grammar="session:request1@form-level.store"> <interpretation> <instance name="Person"> <ex:Person> <ex:Name> Andre Roy </ex:Name> </ex:Person> </instance> <input> may I speak to Andre Roy </input> </interpretation> </result>
Define Grammar Example
定义语法示例
The RECOGNIZE method from the client to the server requests the recognizer to start recognition and provides it with one or more grammar references for grammars to match against the input media. The RECOGNIZE method can carry header fields to control the sensitivity, confidence level, and the level of detail in results provided by the recognizer. These header field values override the current values set by a previous SET-PARAMS method.
从客户端到服务器的RECOGNIZE方法请求识别器启动识别,并为其提供一个或多个语法引用,以便语法与输入媒体匹配。识别方法可以携带标题字段,以控制识别器提供的结果的灵敏度、置信度和详细程度。这些标题字段值覆盖由以前的set-PARAMS方法设置的当前值。
The RECOGNIZE method can request the recognizer resource to operate in normal or hotword mode as specified by the Recognition-Mode header field. The default value is "normal". If the resource could not start a recognition, the server MUST respond with a failure status-
识别方法可以请求识别器资源在识别模式标题字段指定的正常或热词模式下运行。默认值为“正常”。如果资源无法启动识别,服务器必须以失败状态响应-
code of 407 and a Completion-Cause header field in the response describing the cause of failure.
代码407和响应中描述故障原因的完成原因标题字段。
The RECOGNIZE request uses the message body to specify the grammars applicable to the request. The active grammar(s) for the request can be specified in one of three ways. If the client needs to explicitly control grammar weights for the recognition operation, it MUST employ method 3 below. The order of these grammars specifies the precedence of the grammars that is used when more than one grammar in the list matches the speech; in this case, the grammar with the higher precedence is returned as a match. This precedence capability is useful in applications like VoiceXML browsers to order grammars specified at the dialog, document, and root level of a VoiceXML application.
RECOGNIZE请求使用消息体指定适用于该请求的语法。可以通过以下三种方式之一指定请求的活动语法。如果客户端需要显式地控制识别操作的语法权重,则必须使用下面的方法3。这些语法的顺序指定了当列表中有多个语法与语音匹配时使用的语法的优先级;在这种情况下,具有较高优先级的语法将作为匹配项返回。此优先级功能在VoiceXML浏览器等应用程序中非常有用,可以对VoiceXML应用程序的对话框、文档和根级别指定的语法进行排序。
1. The grammar MAY be placed directly in the message body as typed content. If more than one grammar is included in the body, the order of inclusion controls the corresponding precedence for the grammars during recognition, with earlier grammars in the body having a higher precedence than later ones.
1. 语法可以作为类型化内容直接放在消息体中。如果正文中包含多个语法,则包含顺序控制识别期间语法的相应优先级,正文中较早的语法的优先级高于较晚的语法。
2. The body MAY contain a list of grammar URIs specified in content of media type 'text/uri-list' [RFC2483]. The order of the URIs determines the corresponding precedence for the grammars during recognition, with highest precedence first and decreasing for each URI thereafter.
2. 正文可能包含媒体类型“text/uri list”[RFC2483]的内容中指定的语法uri列表。URI的顺序决定了在识别过程中语法的相应优先级,优先级最高,之后每个URI的优先级降低。
3. The body MAY contain a list of grammar URIs specified in content of media type 'text/grammar-ref-list'. This type defines a list of grammar URIs and allows each grammar URI to be assigned a weight in the list. This weight has the same meaning as the weights described in Section 2.4.1 of the Speech Grammar Markup Format (SRGS) [W3C.REC-speech-grammar-20040316].
3. 正文可能包含媒体类型“text/grammar ref list”的内容中指定的语法URI列表。此类型定义语法URI列表,并允许在列表中为每个语法URI分配权重。该权重的含义与语音语法标记格式(SRGS)[W3C.REC-Speech-Grammar-20040316]第2.4.1节中描述的权重相同。
In addition to performing recognition on the input, the recognizer MUST also enroll the collected utterance in a personal grammar if the Enroll-Utterance header field is set to true and an Enrollment is active (via an earlier execution of the START-PHRASE-ENROLLMENT method). If so, and if the RECOGNIZE request contains a Content-ID header field, then the resulting grammar (which includes the personal grammar as a sub-grammar) can be referenced through the 'session' URI scheme (see Section 13.6).
除了对输入执行识别外,如果enroll Seutrance header字段设置为true且注册处于活动状态(通过较早执行START-Phase-Enrollment方法),则识别器还必须在个人语法中注册收集的话语。如果是这样,并且如果识别请求包含内容ID头字段,则生成的语法(包括作为子语法的个人语法)可以通过“会话”URI方案引用(参见第13.6节)。
If the resource was able to successfully start the recognition, the server MUST return a success status-code and a request-state of IN-PROGRESS. This means that the recognizer is active and that the client MUST be prepared to receive further events with this request-id.
如果资源能够成功启动识别,则服务器必须返回成功状态代码和正在进行的请求状态。这意味着识别器处于活动状态,客户端必须准备好接收具有此请求id的进一步事件。
If the resource was able to queue the request, the server MUST return a success code and request-state of PENDING. This means that the recognizer is currently active with another request and that this request has been queued for processing.
如果资源能够将请求排队,则服务器必须返回成功代码,请求状态为PENDING。这意味着识别器当前对另一个请求处于活动状态,并且该请求已排队等待处理。
If the resource could not start a recognition, the server MUST respond with a failure status-code of 407 and a Completion-Cause header field in the response describing the cause of failure.
如果资源无法启动识别,服务器必须在响应中使用故障状态代码407和描述故障原因的完成原因标头字段进行响应。
For the recognizer resource, RECOGNIZE and INTERPRET are the only requests that return a request-state of IN-PROGRESS, meaning that recognition is in progress. When the recognition completes by matching one of the grammar alternatives or by a timeout without a match or for some other reason, the recognizer resource MUST send the client a RECOGNITION-COMPLETE event (or INTERPRETATION-COMPLETE, if INTERPRET was the request) with the result of the recognition and a request-state of COMPLETE.
对于识别器资源,识别和解释是返回“正在进行”请求状态的唯一请求,这意味着识别正在进行。当识别通过匹配其中一个语法替代项或在没有匹配项的情况下超时或出于某些其他原因而完成时,识别器资源必须向客户端发送识别完成事件(或解释完成,如果解释是请求)以及识别结果和请求完成状态。
Large grammars can take a long time for the server to compile. For grammars that are used repeatedly, the client can improve server performance by issuing a DEFINE-GRAMMAR request with the grammar ahead of time. In such a case, the client can issue the RECOGNIZE request and reference the grammar through the 'session' URI scheme (see Section 13.6). This also applies in general if the client wants to repeat recognition with a previous inline grammar.
大型语法可能需要很长时间才能由服务器编译。对于重复使用的语法,客户机可以通过提前发出带有语法的DEFINE-GRAMMAR请求来提高服务器性能。在这种情况下,客户端可以发出识别请求,并通过“会话”URI方案引用语法(参见第13.6节)。如果客户机希望使用以前的内联语法重复识别,这通常也适用。
The RECOGNIZE method implementation MUST do a fetch of all external URIs that are part of that operation. If caching is implemented, this URI fetching MUST conform to the cache control hints and parameter header fields associated with the method in deciding whether it should be fetched from cache or from the external server. If these hints/parameters are not specified in the method, the values set for the session using SET-PARAMS/GET-PARAMS apply. If it was not set for the session, their default values apply.
Recognite方法实现必须获取作为该操作一部分的所有外部URI。如果实现了缓存,则此URI获取必须符合与方法关联的缓存控制提示和参数头字段,以决定是从缓存还是从外部服务器获取。如果在方法中未指定这些提示/参数,则将应用使用set-PARAMS/GET-PARAMS为会话设置的值。如果未为会话设置,则应用其默认值。
Note that since the audio and the messages are carried over separate communication paths there may be a race condition between the start of the flow of audio and the receipt of the RECOGNIZE method. For example, if an audio flow is started by the client at the same time as the RECOGNIZE method is sent, either the audio or the RECOGNIZE can arrive at the recognizer first. As another example, the client may choose to continuously send audio to the server and signal the server to recognize using the RECOGNIZE method. Mechanisms to resolve this condition are outside the scope of this specification. The recognizer can expect the media to start flowing when it receives the RECOGNIZE request, but it MUST NOT buffer anything it receives beforehand in order to preserve the semantics that application authors expect with respect to the input timers.
注意,由于音频和消息通过单独的通信路径传送,因此在音频流的开始和识别方法的接收之间可能存在竞争条件。例如,如果客户端在发送识别方法的同时启动音频流,则音频或识别可以首先到达识别器。作为另一个示例,客户端可以选择连续地向服务器发送音频,并用识别方法向服务器发送识别信号。解决这种情况的机制不在本规范的范围内。识别器可以期望媒体在接收到识别请求时开始流动,但它不能预先缓冲它接收到的任何内容,以保留应用程序作者期望的关于输入计时器的语义。
When a RECOGNIZE method has been received, the recognition is initiated on the stream. The No-Input-Timer MUST be started at this time if the Start-Input-Timers header field is specified as "true". If this header field is set to "false", the No-Input-Timer MUST be started when it receives the START-INPUT-TIMERS method from the client. The Recognition-Timeout MUST be started when the recognition resource detects speech or a DTMF digit in the media stream.
当接收到识别方法时,在流上启动识别。如果Start Input Timers标头字段指定为“true”,则此时必须启动无输入计时器。如果此标头字段设置为“false”,则必须在从客户端接收START-Input-TIMERS方法时启动无输入计时器。当识别资源检测到媒体流中的语音或DTMF数字时,必须启动识别超时。
For recognition when not in hotword mode:
在不处于hotword模式时进行识别:
When the recognizer resource detects speech or a DTMF digit in the media stream, it MUST send the START-OF-INPUT event. When enough speech has been collected for the server to process, the recognizer can try to match the collected speech with the active grammars. If the speech collected at this point fully matches with any of the active grammars, the Speech-Complete-Timer is started. If it matches partially with one or more of the active grammars, with more speech needed before a full match is achieved, then the Speech-Incomplete-Timer is started.
当识别器资源在媒体流中检测到语音或DTMF数字时,它必须发送输入开始事件。当收集到足够的语音供服务器处理时,识别器可以尝试将收集的语音与活动语法进行匹配。如果此时收集的语音与任何活动语法完全匹配,则语音完成计时器将启动。如果它与一个或多个活动语法部分匹配,在实现完全匹配之前需要更多的语音,则语音不完整计时器将启动。
1. When the No-Input-Timer expires, the recognizer MUST complete with a Completion-Cause code of "no-input-timeout".
1. 当无输入计时器过期时,识别器必须以“无输入超时”的完成原因代码完成。
2. The recognizer MUST support detecting a no-match condition upon detecting end of speech. The recognizer MAY support detecting a no-match condition before waiting for end-of-speech. If this is supported, this capability is enabled by setting the Early-No-Match header field to "true". Upon detecting a no-match condition, the RECOGNIZE MUST return with "no-match".
2. 识别器必须支持在检测到语音结束时检测不匹配情况。识别器可支持在等待语音结束之前检测不匹配条件。如果支持此功能,则可通过将早期不匹配标头字段设置为“true”来启用此功能。在检测到不匹配条件时,识别必须返回“不匹配”。
3. When the Speech-Incomplete-Timer expires, the recognizer SHOULD complete with a Completion-Cause code of "partial-match", unless the recognizer cannot differentiate a partial-match, in which case it MUST return a Completion-Cause code of "no-match". The recognizer MAY return results for the partially matched grammar.
3. 当语音不完整计时器过期时,识别器应使用“部分匹配”的完成原因代码完成,除非识别器无法区分部分匹配,在这种情况下,它必须返回“不匹配”的完成原因代码。识别器可以返回部分匹配语法的结果。
4. When the Speech-Complete-Timer expires, the recognizer MUST complete with a Completion-Cause code of "success".
4. 当语音完成计时器过期时,识别器必须以完成原因代码“成功”完成。
5. When the Recognition-Timeout expires, one of the following MUST happen:
5. 识别超时过期时,必须发生以下情况之一:
5.1. If there was a partial-match, the recognizer SHOULD complete with a Completion-Cause code of "partial-match-maxtime", unless the recognizer cannot differentiate a partial-match, in which case it MUST complete with a Completion-Cause code of "no-match-maxtime". The recognizer MAY return results for the partially matched grammar.
5.1. 如果存在部分匹配,识别器应使用“部分匹配maxtime”的完成原因代码完成,除非识别器无法区分部分匹配,在这种情况下,识别器必须使用“不匹配maxtime”的完成原因代码完成。识别器可以返回部分匹配语法的结果。
5.2. If there was a full-match, the recognizer MUST complete with a Completion-Cause code of "success-maxtime".
5.2. 如果存在完全匹配,识别器必须使用完成原因代码“success maxtime”完成。
5.3. If there was a no match, the recognizer MUST complete with a Completion-Cause code of "no-match-maxtime".
5.3. 如果不匹配,识别器必须使用“不匹配maxtime”的完成原因代码完成。
For recognition in hotword mode:
要在hotword模式下识别,请执行以下操作:
Note that for recognition in hotword mode the START-OF-INPUT event is not generated when speech or a DTMF digit is detected.
注意,对于热词模式下的识别,当检测到语音或DTMF数字时,不会生成输入开始事件。
1. When the No-Input-Timer expires, the recognizer MUST complete with a Completion-Cause code of "no-input-timeout".
1. 当无输入计时器过期时,识别器必须以“无输入超时”的完成原因代码完成。
2. If at any point a match occurs, the RECOGNIZE MUST complete with a Completion-Cause code of "success".
2. 如果在任何时候发生匹配,则识别必须以“成功”的完成原因代码完成。
3. When the Recognition-Timeout expires and there is not a match, the RECOGNIZE MUST complete with a Completion-Cause code of "hotword-maxtime".
3. 当识别超时到期且不存在匹配项时,识别必须以“hotword maxtime”的完成原因代码完成。
4. When the Recognition-Timeout expires and there is a match, the RECOGNIZE MUST complete with a Completion-Cause code of "success-maxtime".
4. 当识别超时过期且存在匹配项时,识别必须以完成原因代码“success maxtime”完成。
5. When the Recognition-Timeout is running but the detected speech/ DTMF has not resulted in a match, the Recognition-Timeout MUST be stopped and reset. It MUST then be restarted when speech/DTMF is again detected.
5. 当识别超时正在运行,但检测到的语音/DTMF未导致匹配时,必须停止并重置识别超时。当再次检测到语音/DTMF时,必须重新启动。
Below is a complete example of using RECOGNIZE. It shows the call to RECOGNIZE, the IN-PROGRESS and START-OF-INPUT status messages, and the final RECOGNITION-COMPLETE message containing the result.
下面是使用识别的完整示例。它显示对识别的调用、正在进行和开始输入状态消息以及包含结果的最终识别完成消息。
C->S:MRCP/2.0 ... RECOGNIZE 543257 Channel-Identifier:32AECB23433801@speechrecog Confidence-Threshold:0.9 Content-Type:application/srgs+xml Content-ID:<request1@form-level.store> Content-Length:...
C->S:MRCP/2.0。。。识别543257通道标识符:32AECB23433801@speechrecog置信阈值:0.9内容类型:应用程序/srgs+xml内容ID:<request1@form-level.store>内容长度:。。。
<?xml version="1.0"?>
<?xml version="1.0"?>
<!-- the default grammar language is US English --> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" root="request">
<!-- the default grammar language is US English --> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" root="request">
<!-- single language attachment to tokens --> <rule id="yes"> <one-of> <item xml:lang="fr-CA">oui</item> <item xml:lang="en-US">yes</item> </one-of> </rule>
<!-- single language attachment to tokens --> <rule id="yes"> <one-of> <item xml:lang="fr-CA">oui</item> <item xml:lang="en-US">yes</item> </one-of> </rule>
<!-- single language attachment to a rule expansion --> <rule id="request"> may I speak to <one-of xml:lang="fr-CA"> <item>Michel Tremblay</item> <item>Andre Roy</item> </one-of> </rule>
<!-- single language attachment to a rule expansion --> <rule id="request"> may I speak to <one-of xml:lang="fr-CA"> <item>Michel Tremblay</item> <item>Andre Roy</item> </one-of> </rule>
</grammar>
</grammar>
S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speechrecog
S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speechrecog
S->C:MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS Channel-Identifier:32AECB23433801@speechrecog
S->C:MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS Channel-Identifier:32AECB23433801@speechrecog
S->C:MRCP/2.0 ... RECOGNITION-COMPLETE 543257 COMPLETE Channel-Identifier:32AECB23433801@speechrecog Completion-Cause:000 success Waveform-URI:<http://web.media.com/session123/audio.wav>; size=424252;duration=2543 Content-Type:application/nlsml+xml Content-Length:...
S->C:MRCP/2.0。。。识别-完整543257完整通道标识符:32AECB23433801@speechrecog完成原因:000成功波形URI:<http://web.media.com/session123/audio.wav>; 尺寸=424252;持续时间=2543内容类型:应用程序/nlsml+xml内容长度:。。。
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:ex="http://www.example.com/example" grammar="session:request1@form-level.store"> <interpretation> <instance name="Person"> <ex:Person> <ex:Name> Andre Roy </ex:Name> </ex:Person> </instance> <input> may I speak to Andre Roy </input> </interpretation> </result>
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:ex="http://www.example.com/example" grammar="session:request1@form-level.store"> <interpretation> <instance name="Person"> <ex:Person> <ex:Name> Andre Roy </ex:Name> </ex:Person> </instance> <input> may I speak to Andre Roy </input> </interpretation> </result>
Below is an example of calling RECOGNIZE with a different grammar. No status or completion messages are shown in this example, although they would of course occur in normal usage.
下面是使用不同语法调用RECOGNIZE的示例。本例中没有显示状态或完成消息,尽管它们在正常使用中当然会出现。
C->S: MRCP/2.0 ... RECOGNIZE 543257 Channel-Identifier:32AECB23433801@speechrecog Confidence-Threshold:0.9 Fetch-Timeout:20 Content-Type:application/srgs+xml Content-Length:...
C->S:MRCP/2.0。。。识别543257通道标识符:32AECB23433801@speechrecog置信阈值:0.9获取超时:20内容类型:应用程序/srgs+xml内容长度:。。。
<?xml version="1.0"? Version="1.0" mode="voice" root="Basic md"> <rule id="rule_list" scope="public"> <one-of> <item weight=10> <ruleref uri= "http://grammar.example.com/world-cities.grxml#canada"/> </item> <item weight=1.5> <ruleref uri= "http://grammar.example.com/world-cities.grxml#america"/> </item> <item weight=0.5> <ruleref uri= "http://grammar.example.com/world-cities.grxml#india"/> </item> </one-of> </rule>
<?xml version="1.0"? Version="1.0" mode="voice" root="Basic md"> <rule id="rule_list" scope="public"> <one-of> <item weight=10> <ruleref uri= "http://grammar.example.com/world-cities.grxml#canada"/> </item> <item weight=1.5> <ruleref uri= "http://grammar.example.com/world-cities.grxml#america"/> </item> <item weight=0.5> <ruleref uri= "http://grammar.example.com/world-cities.grxml#india"/> </item> </one-of> </rule>
The STOP method from the client to the server tells the resource to stop recognition if a request is active. If a RECOGNIZE request is active and the STOP request successfully terminated it, then the response header section contains an Active-Request-Id-List header field containing the request-id of the RECOGNIZE request that was terminated. In this case, no RECOGNITION-COMPLETE event is sent for the terminated request. If there was no recognition active, then the response MUST NOT contain an Active-Request-Id-List header field. Either way, the response MUST contain a status-code of 200 "Success".
从客户端到服务器的STOP方法告诉资源在请求处于活动状态时停止识别。如果识别请求处于活动状态且停止请求成功终止,则响应标头部分包含活动请求Id列表标头字段,其中包含已终止的识别请求的请求Id。在这种情况下,不会为终止的请求发送识别-完成事件。如果没有激活的识别,则响应不得包含激活的请求Id列表标题字段。无论哪种方式,响应必须包含200“成功”的状态代码。
C->S: MRCP/2.0 ... RECOGNIZE 543257 Channel-Identifier:32AECB23433801@speechrecog Confidence-Threshold:0.9 Content-Type:application/srgs+xml Content-ID:<request1@form-level.store> Content-Length:...
C->S:MRCP/2.0。。。识别543257通道标识符:32AECB23433801@speechrecog置信阈值:0.9内容类型:应用程序/srgs+xml内容ID:<request1@form-level.store>内容长度:。。。
<?xml version="1.0"?>
<?xml version="1.0"?>
<!-- the default grammar language is US English --> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" root="request">
<!-- the default grammar language is US English --> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" root="request">
<!-- single language attachment to tokens --> <rule id="yes"> <one-of> <item xml:lang="fr-CA">oui</item> <item xml:lang="en-US">yes</item> </one-of> </rule>
<!-- single language attachment to tokens --> <rule id="yes"> <one-of> <item xml:lang="fr-CA">oui</item> <item xml:lang="en-US">yes</item> </one-of> </rule>
<!-- single language attachment to a rule expansion --> <rule id="request"> may I speak to <one-of xml:lang="fr-CA"> <item>Michel Tremblay</item> <item>Andre Roy</item> </one-of> </rule> </grammar>
<!-- single language attachment to a rule expansion --> <rule id="request"> may I speak to <one-of xml:lang="fr-CA"> <item>Michel Tremblay</item> <item>Andre Roy</item> </one-of> </rule> </grammar>
S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speechrecog
S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speechrecog
C->S: MRCP/2.0 ... STOP 543258 200 Channel-Identifier:32AECB23433801@speechrecog
C->S: MRCP/2.0 ... STOP 543258 200 Channel-Identifier:32AECB23433801@speechrecog
S->C: MRCP/2.0 ... 543258 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog Active-Request-Id-List:543257
S->C: MRCP/2.0 ... 543258 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog Active-Request-Id-List:543257
The GET-RESULT method from the client to the server MAY be issued when the recognizer resource is in the recognized state. This request allows the client to retrieve results for a completed recognition. This is useful if the client decides it wants more alternatives or more information. When the server receives this request, it re-computes and returns the results according to the recognition constraints provided in the GET-RESULT request.
当识别器资源处于已识别状态时,可能会发出从客户端到服务器的GET-RESULT方法。此请求允许客户端检索完成识别的结果。如果客户决定需要更多的备选方案或更多的信息,这将非常有用。当服务器收到此请求时,它将根据GET-RESULT请求中提供的识别约束重新计算并返回结果。
The GET-RESULT request can specify constraints such as a different confidence-threshold or n-best-list-length. This capability is OPTIONAL for MRCPv2 servers and the automatic speech recognition engine in the server MUST return a status of unsupported feature if not supported.
GET-RESULT请求可以指定约束,例如不同的置信阈值或n-best-list-length。此功能对于MRCPv2服务器是可选的,如果不支持,服务器中的自动语音识别引擎必须返回不支持功能的状态。
C->S: MRCP/2.0 ... GET-RESULT 543257 Channel-Identifier:32AECB23433801@speechrecog Confidence-Threshold:0.9
C->S: MRCP/2.0 ... GET-RESULT 543257 Channel-Identifier:32AECB23433801@speechrecog Confidence-Threshold:0.9
S->C: MRCP/2.0 ... 543257 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog Content-Type:application/nlsml+xml Content-Length:...
S->C:MRCP/2.0。。。543257 200完整通道标识符:32AECB23433801@speechrecog内容类型:应用程序/nlsml+xml内容长度:。。。
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:ex="http://www.example.com/example" grammar="session:request1@form-level.store"> <interpretation> <instance name="Person"> <ex:Person> <ex:Name> Andre Roy </ex:Name> </ex:Person> </instance> <input> may I speak to Andre Roy </input> </interpretation> </result>
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:ex="http://www.example.com/example" grammar="session:request1@form-level.store"> <interpretation> <instance name="Person"> <ex:Person> <ex:Name> Andre Roy </ex:Name> </ex:Person> </instance> <input> may I speak to Andre Roy </input> </interpretation> </result>
This is an event from the server to the client indicating that the recognizer resource has detected speech or a DTMF digit in the media stream. This event is useful in implementing kill-on-barge-in scenarios when a synthesizer resource is in a different session from the recognizer resource and hence is not aware of an incoming audio source (see Section 8.4.2). In these cases, it is up to the client to act as an intermediary and respond to this event by issuing a BARGE-IN-OCCURRED event to the synthesizer resource. The recognizer resource also MUST send a Proxy-Sync-Id header field with a unique value for this event.
这是一个从服务器到客户端的事件,表示识别器资源在媒体流中检测到语音或DTMF数字。当合成器资源与识别器资源处于不同的会话中,因此不知道传入音频源时,此事件有助于在场景中实施驳船上终止(参见第8.4.2节)。在这些情况下,由客户机充当中介,并通过向合成器资源发出突发事件来响应此事件。识别器资源还必须为此事件发送具有唯一值的代理同步Id标头字段。
This event MUST be generated by the server, irrespective of whether or not the synthesizer and recognizer are on the same server.
无论合成器和识别器是否在同一台服务器上,此事件都必须由服务器生成。
This request is sent from the client to the recognizer resource when it knows that a kill-on-barge-in prompt has finished playing (see Section 8.4.2). This is useful in the scenario when the recognition and synthesizer engines are not in the same session. When a kill-on-barge-in prompt is being played, the client may want a RECOGNIZE request to be simultaneously active so that it can detect and implement kill-on-barge-in. But at the same time the client doesn't want the recognizer to start the no-input timers until the prompt is finished. The Start-Input-Timers header field in the RECOGNIZE request allows the client to say whether or not the timers should be started immediately. If not, the recognizer resource MUST NOT start the timers until the client sends a START-INPUT-TIMERS method to the recognizer.
当客户端知道驳船压井提示已完成播放时,该请求从客户端发送到识别器资源(参见第8.4.2节)。这在识别引擎和合成器引擎不在同一会话中的场景中非常有用。在播放驳船压井提示时,客户可能希望同时激活识别请求,以便能够检测并实施驳船压井。但同时,客户端不希望识别器在提示完成之前启动无输入计时器。Recognite请求中的Start Input Timers标头字段允许客户端说明是否应立即启动计时器。如果不是,则在客户端向识别器发送start-INPUT-timers方法之前,识别器资源不得启动计时器。
This is an event from the recognizer resource to the client indicating that the recognition completed. The recognition result is sent in the body of the MRCPv2 message. The request-state field MUST be COMPLETE indicating that this is the last event with that request-id and that the request with that request-id is now complete. The server MUST maintain the recognizer context containing the results and the audio waveform input of that recognition until the next RECOGNIZE request is issued for that resource or the session terminates. If the server returns a URI to the audio waveform, it MUST do so in a Waveform-URI header field in the RECOGNITION-COMPLETE event. The client can use this URI to retrieve or playback the audio.
这是从识别器资源到客户端的事件,表示识别已完成。识别结果在MRCPv2消息正文中发送。请求状态字段必须完整,表明这是具有该请求id的最后一个事件,并且具有该请求id的请求现在已完成。服务器必须维护包含该识别结果和音频波形输入的识别器上下文,直到为该资源发出下一个识别请求或会话终止。如果服务器向音频波形返回URI,则必须在RECOGNITION-COMPLETE事件的波形URI头字段中执行此操作。客户端可以使用此URI检索或播放音频。
Note, if an enrollment session was active, the RECOGNITION-COMPLETE event can contain either recognition or enrollment results depending on what was spoken. The following example shows a complete exchange with a recognition result.
注意,如果注册会话处于活动状态,则RECOGNITION-COMPLETE事件可以包含识别或注册结果,具体取决于所说的内容。以下示例显示了具有识别结果的完整交换。
C->S: MRCP/2.0 ... RECOGNIZE 543257 Channel-Identifier:32AECB23433801@speechrecog Confidence-Threshold:0.9 Content-Type:application/srgs+xml Content-ID:<request1@form-level.store> Content-Length:...
C->S:MRCP/2.0。。。识别543257通道标识符:32AECB23433801@speechrecog置信阈值:0.9内容类型:应用程序/srgs+xml内容ID:<request1@form-level.store>内容长度:。。。
<?xml version="1.0"?>
<?xml version="1.0"?>
<!-- the default grammar language is US English --> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" root="request">
<!-- the default grammar language is US English --> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" root="request">
<!-- single language attachment to tokens --> <rule id="yes"> <one-of> <item xml:lang="fr-CA">oui</item> <item xml:lang="en-US">yes</item> </one-of> </rule>
<!-- single language attachment to tokens --> <rule id="yes"> <one-of> <item xml:lang="fr-CA">oui</item> <item xml:lang="en-US">yes</item> </one-of> </rule>
<!-- single language attachment to a rule expansion --> <rule id="request"> may I speak to <one-of xml:lang="fr-CA"> <item>Michel Tremblay</item> <item>Andre Roy</item> </one-of> </rule> </grammar>
<!-- single language attachment to a rule expansion --> <rule id="request"> may I speak to <one-of xml:lang="fr-CA"> <item>Michel Tremblay</item> <item>Andre Roy</item> </one-of> </rule> </grammar>
S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speechrecog
S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speechrecog
S->C: MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS Channel-Identifier:32AECB23433801@speechrecog
S->C: MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS Channel-Identifier:32AECB23433801@speechrecog
S->C: MRCP/2.0 ... RECOGNITION-COMPLETE 543257 COMPLETE Channel-Identifier:32AECB23433801@speechrecog Completion-Cause:000 success Waveform-URI:<http://web.media.com/session123/audio.wav>; size=342456;duration=25435 Content-Type:application/nlsml+xml Content-Length:...
S->C:MRCP/2.0。。。识别-完整543257完整通道标识符:32AECB23433801@speechrecog完成原因:000成功波形URI:<http://web.media.com/session123/audio.wav>; 尺寸=342456;持续时间=25435内容类型:应用程序/nlsml+xml内容长度:。。。
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:ex="http://www.example.com/example" grammar="session:request1@form-level.store"> <interpretation> <instance name="Person"> <ex:Person> <ex:Name> Andre Roy </ex:Name> </ex:Person> </instance> <input> may I speak to Andre Roy </input> </interpretation> </result>
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:ex="http://www.example.com/example" grammar="session:request1@form-level.store"> <interpretation> <instance name="Person"> <ex:Person> <ex:Name> Andre Roy </ex:Name> </ex:Person> </instance> <input> may I speak to Andre Roy </input> </interpretation> </result>
If the result were instead an enrollment result, the final message from the server above could have been:
如果结果是注册结果,则来自上面服务器的最后一条消息可能是:
S->C: MRCP/2.0 ... RECOGNITION-COMPLETE 543257 COMPLETE Channel-Identifier:32AECB23433801@speechrecog Completion-Cause:000 success Content-Type:application/nlsml+xml Content-Length:...
S->C:MRCP/2.0。。。识别-完整543257完整通道标识符:32AECB23433801@speechrecog完成原因:000成功内容类型:应用程序/nlsml+xml内容长度:。。。
<?xml version= "1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" grammar="Personal-Grammar-URI"> <enrollment-result> <num-clashes> 2 </num-clashes> <num-good-repetitions> 1 </num-good-repetitions> <num-repetitions-still-needed> 1 </num-repetitions-still-needed> <consistency-status> consistent </consistency-status> <clash-phrase-ids> <item> Jeff </item> <item> Andre </item> </clash-phrase-ids> <transcriptions> <item> m ay b r ow k er </item> <item> m ax r aa k ah </item> </transcriptions>
<?xml version= "1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" grammar="Personal-Grammar-URI"> <enrollment-result> <num-clashes> 2 </num-clashes> <num-good-repetitions> 1 </num-good-repetitions> <num-repetitions-still-needed> 1 </num-repetitions-still-needed> <consistency-status> consistent </consistency-status> <clash-phrase-ids> <item> Jeff </item> <item> Andre </item> </clash-phrase-ids> <transcriptions> <item> m ay b r ow k er </item> <item> m ax r aa k ah </item> </transcriptions>
<confusable-phrases> <item> <phrase> call </phrase> <confusion-level> 10 </confusion-level> </item> </confusable-phrases> </enrollment-result> </result>
<confusable-phrases> <item> <phrase> call </phrase> <confusion-level> 10 </confusion-level> </item> </confusable-phrases> </enrollment-result> </result>
The START-PHRASE-ENROLLMENT method from the client to the server starts a new phrase enrollment session during which the client can call RECOGNIZE multiple times to enroll a new utterance in a grammar. An enrollment session consists of a set of calls to RECOGNIZE in which the caller speaks a phrase several times so the system can "learn" it. The phrase is then added to a personal grammar (speaker-trained grammar), so that the system can recognize it later.
从客户端到服务器的START-Phase-ENROLLMENT方法启动一个新的短语注册会话,在此会话期间,客户端可以多次调用Recognite以注册语法中的新话语。注册会话由一组要识别的调用组成,在这些调用中,调用方多次说出一个短语,以便系统能够“学习”它。然后将短语添加到个人语法(说话人训练语法)中,以便系统以后能够识别它。
Only one phrase enrollment session can be active at a time for a resource. The Personal-Grammar-URI identifies the grammar that is used during enrollment to store the personal list of phrases. Once RECOGNIZE is called, the result is returned in a RECOGNITION-COMPLETE event and will contain either an enrollment result OR a recognition result for a regular recognition.
对于资源,一次只能有一个短语注册会话处于活动状态。个人语法URI标识注册期间用于存储短语个人列表的语法。调用Recognite后,结果将在RECOGNITION-COMPLETE事件中返回,并将包含注册结果或常规识别的识别结果。
Calling END-PHRASE-ENROLLMENT ends the ongoing phrase enrollment session, which is typically done after a sequence of successful calls to RECOGNIZE. This method can be called to commit the new phrase to the personal grammar or to abort the phrase enrollment session.
调用END-Phase-ENROLLMENT结束正在进行的短语注册会话,该会话通常在一系列成功的识别调用之后完成。可以调用此方法将新短语提交到个人语法或中止短语注册会话。
The grammar to contain the new enrolled phrase, specified by Personal-Grammar-URI, is created if it does not exist. Also, the personal grammar MUST ONLY contain phrases added via a phrase enrollment session.
如果不存在由个人语法URI指定的包含新注册短语的语法,则会创建该语法。此外,个人语法必须仅包含通过短语注册会话添加的短语。
The Phrase-ID passed to this method is used to identify this phrase in the grammar and will be returned as the speech input when doing a RECOGNIZE on the grammar. The Phrase-NL similarly is returned in a RECOGNITION-COMPLETE event in the same manner as other Natural Language (NL) in a grammar. The tag-format of this NL is implementation specific.
传递给此方法的短语ID用于在语法中标识此短语,并在对语法进行识别时作为语音输入返回。短语NL类似地在识别完成事件中以与语法中的其他自然语言(NL)相同的方式返回。此NL的标记格式是特定于实现的。
If the client has specified Save-Best-Waveform as true, then the response after ending the phrase enrollment session MUST contain the location/URI of a recording of the best repetition of the learned phrase.
如果客户端已将Save Best Waveform指定为true,则结束短语注册会话后的响应必须包含所学短语最佳重复记录的位置/URI。
C->S: MRCP/2.0 ... START-PHRASE-ENROLLMENT 543258 Channel-Identifier:32AECB23433801@speechrecog Num-Min-Consistent-Pronunciations:2 Consistency-Threshold:30 Clash-Threshold:12 Personal-Grammar-URI:<personal grammar uri> Phrase-Id:<phrase id> Phrase-NL:<NL phrase> Weight:1 Save-Best-Waveform:true
C->S: MRCP/2.0 ... START-PHRASE-ENROLLMENT 543258 Channel-Identifier:32AECB23433801@speechrecog Num-Min-Consistent-Pronunciations:2 Consistency-Threshold:30 Clash-Threshold:12 Personal-Grammar-URI:<personal grammar uri> Phrase-Id:<phrase id> Phrase-NL:<NL phrase> Weight:1 Save-Best-Waveform:true
S->C: MRCP/2.0 ... 543258 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog
S->C: MRCP/2.0 ... 543258 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog
The ENROLLMENT-ROLLBACK method discards the last live utterance from the RECOGNIZE operation. The client can invoke this method when the caller provides undesirable input such as non-speech noises, side-speech, commands, utterance from the RECOGNIZE grammar, etc. Note that this method does not provide a stack of rollback states. Executing ENROLLMENT-ROLLBACK twice in succession without an intervening recognition operation has no effect the second time.
ENROLLMENT-ROLLBACK方法从识别操作中丢弃最后一个实时话语。当调用方提供不需要的输入(如非语音噪音、旁白、命令、来自识别语法的话语等)时,客户端可以调用此方法。请注意,此方法不提供回滚状态堆栈。连续执行注册-回滚两次而不进行中间的识别操作,第二次不起作用。
C->S: MRCP/2.0 ... ENROLLMENT-ROLLBACK 543261 Channel-Identifier:32AECB23433801@speechrecog
C->S: MRCP/2.0 ... ENROLLMENT-ROLLBACK 543261 Channel-Identifier:32AECB23433801@speechrecog
S->C: MRCP/2.0 ... 543261 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog
S->C: MRCP/2.0 ... 543261 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog
The client MAY call the END-PHRASE-ENROLLMENT method ONLY during an active phrase enrollment session. It MUST NOT be called during an ongoing RECOGNIZE operation. To commit the new phrase in the grammar, the client MAY call this method once successive calls to RECOGNIZE have succeeded and Num-Repetitions-Still-Needed has been returned as 0 in the RECOGNITION-COMPLETE event. Alternatively, the client MAY abort the phrase enrollment session by calling this method with the Abort-Phrase-Enrollment header field.
客户端只能在活动短语注册会话期间调用结束短语注册方法。在正在进行的识别操作期间,不得调用它。要提交语法中的新短语,一旦连续调用RECOGNITION成功,并且RECOGNITION-COMPLETE事件中仍需要的Num Repetitions返回为0,客户端可以调用此方法。或者,客户端可以通过使用abort Phase enrollment header字段调用此方法来中止短语注册会话。
If the client has specified Save-Best-Waveform as "true" in the START-PHRASE-ENROLLMENT request, then the response MUST contain a Waveform-URI header whose value is the location/URI of a recording of the best repetition of the learned phrase.
如果客户机在开始短语注册请求中将“保存最佳波形”指定为“真”,则响应必须包含波形URI头,其值为学习短语最佳重复记录的位置/URI。
C->S: MRCP/2.0 ... END-PHRASE-ENROLLMENT 543262 Channel-Identifier:32AECB23433801@speechrecog
C->S: MRCP/2.0 ... END-PHRASE-ENROLLMENT 543262 Channel-Identifier:32AECB23433801@speechrecog
S->C: MRCP/2.0 ... 543262 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog Waveform-URI:<http://mediaserver.com/recordings/file1324.wav>; size=242453;duration=25432
S->C: MRCP/2.0 ... 543262 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog Waveform-URI:<http://mediaserver.com/recordings/file1324.wav>; size=242453;duration=25432
The MODIFY-PHRASE method sent from the client to the server is used to change the phrase ID, NL phrase, and/or weight for a given phrase in a personal grammar.
从客户端发送到服务器的MODIFY-PHRASE方法用于更改个人语法中给定短语的短语ID、NL短语和/或权重。
If no fields are supplied, then calling this method has no effect.
如果未提供任何字段,则调用此方法无效。
C->S: MRCP/2.0 ... MODIFY-PHRASE 543265 Channel-Identifier:32AECB23433801@speechrecog Personal-Grammar-URI:<personal grammar uri> Phrase-Id:<phrase id> New-Phrase-Id:<new phrase id> Phrase-NL:<NL phrase> Weight:1
C->S: MRCP/2.0 ... MODIFY-PHRASE 543265 Channel-Identifier:32AECB23433801@speechrecog Personal-Grammar-URI:<personal grammar uri> Phrase-Id:<phrase id> New-Phrase-Id:<new phrase id> Phrase-NL:<NL phrase> Weight:1
S->C: MRCP/2.0 ... 543265 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog
S->C: MRCP/2.0 ... 543265 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog
The DELETE-PHRASE method sent from the client to the server is used to delete a phase that is in a personal grammar and was added through voice enrollment or text enrollment. If the specified phrase does not exist, this method has no effect.
从客户端发送到服务器的DELETE-PHRASE方法用于删除个人语法中通过语音注册或文本注册添加的阶段。如果指定的短语不存在,则此方法无效。
C->S: MRCP/2.0 ... DELETE-PHRASE 543266 Channel-Identifier:32AECB23433801@speechrecog Personal-Grammar-URI:<personal grammar uri> Phrase-Id:<phrase id>
C->S: MRCP/2.0 ... DELETE-PHRASE 543266 Channel-Identifier:32AECB23433801@speechrecog Personal-Grammar-URI:<personal grammar uri> Phrase-Id:<phrase id>
S->C: MRCP/2.0 ... 543266 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog
S->C: MRCP/2.0 ... 543266 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog
The INTERPRET method from the client to the server takes as input an Interpret-Text header field containing the text for which the semantic interpretation is desired, and returns, via the INTERPRETATION-COMPLETE event, an interpretation result that is very similar to the one returned from a RECOGNIZE method invocation. Only
从客户端到服务器的解释方法将包含需要语义解释的文本的解释文本标题字段作为输入,并通过解释完成事件返回与识别方法调用返回的解释结果非常相似的解释结果。只有
portions of the result relevant to acoustic matching are excluded from the result. The Interpret-Text header field MUST be included in the INTERPRET request.
结果中不包括与声学匹配相关的部分。解释请求中必须包含解释文本标题字段。
Recognizer grammar data is treated in the same way as it is when issuing a RECOGNIZE method call.
识别器语法数据的处理方式与发出识别方法调用时的处理方式相同。
If a RECOGNIZE, RECORD, or another INTERPRET operation is already in progress for the resource, the server MUST reject the request with a response having a status-code of 402 "Method not valid in this state", and a COMPLETE request state.
如果资源的识别、记录或另一解释操作已经在进行中,则服务器必须拒绝请求,并且响应的状态代码为402“Method not valid in this state”(方法在此状态下无效),并且请求状态完整。
C->S: MRCP/2.0 ... INTERPRET 543266 Channel-Identifier:32AECB23433801@speechrecog Interpret-Text:may I speak to Andre Roy Content-Type:application/srgs+xml Content-ID:<request1@form-level.store> Content-Length:...
C->S:MRCP/2.0。。。解释543266通道标识符:32AECB23433801@speechrecog解释文本:我可以和Andre Roy通话吗内容类型:应用程序/srgs+xml内容ID:<request1@form-level.store>内容长度:。。。
<?xml version="1.0"?> <!-- the default grammar language is US English --> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" root="request"> <!-- single language attachment to tokens --> <rule id="yes"> <one-of> <item xml:lang="fr-CA">oui</item> <item xml:lang="en-US">yes</item> </one-of> </rule>
<?xml version="1.0"?> <!-- the default grammar language is US English --> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" root="request"> <!-- single language attachment to tokens --> <rule id="yes"> <one-of> <item xml:lang="fr-CA">oui</item> <item xml:lang="en-US">yes</item> </one-of> </rule>
<!-- single language attachment to a rule expansion --> <rule id="request"> may I speak to <one-of xml:lang="fr-CA"> <item>Michel Tremblay</item> <item>Andre Roy</item> </one-of> </rule> </grammar>
<!-- single language attachment to a rule expansion --> <rule id="request"> may I speak to <one-of xml:lang="fr-CA"> <item>Michel Tremblay</item> <item>Andre Roy</item> </one-of> </rule> </grammar>
S->C: MRCP/2.0 ... 543266 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speechrecog
S->C: MRCP/2.0 ... 543266 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speechrecog
S->C: MRCP/2.0 ... INTERPRETATION-COMPLETE 543266 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog Completion-Cause:000 success Content-Type:application/nlsml+xml Content-Length:...
S->C:MRCP/2.0。。。解释-完整543266 200完整通道标识符:32AECB23433801@speechrecog完成原因:000成功内容类型:应用程序/nlsml+xml内容长度:。。。
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:ex="http://www.example.com/example" grammar="session:request1@form-level.store"> <interpretation> <instance name="Person"> <ex:Person> <ex:Name> Andre Roy </ex:Name> </ex:Person> </instance> <input> may I speak to Andre Roy </input> </interpretation> </result>
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:ex="http://www.example.com/example" grammar="session:request1@form-level.store"> <interpretation> <instance name="Person"> <ex:Person> <ex:Name> Andre Roy </ex:Name> </ex:Person> </instance> <input> may I speak to Andre Roy </input> </interpretation> </result>
This event from the recognizer resource to the client indicates that the INTERPRET operation is complete. The interpretation result is sent in the body of the MRCP message. The request state MUST be set to COMPLETE.
从识别器资源到客户端的此事件表示解释操作已完成。解释结果在MRCP报文正文中发送。请求状态必须设置为“完成”。
The Completion-Cause header field MUST be included in this event and MUST be set to an appropriate value from the list of cause codes.
完成原因标题字段必须包含在此事件中,并且必须设置为原因代码列表中的适当值。
C->S: MRCP/2.0 ... INTERPRET 543266 Channel-Identifier:32AECB23433801@speechrecog Interpret-Text:may I speak to Andre Roy Content-Type:application/srgs+xml Content-ID:<request1@form-level.store> Content-Length:...
C->S:MRCP/2.0。。。解释543266通道标识符:32AECB23433801@speechrecog解释文本:我可以和Andre Roy通话吗内容类型:应用程序/srgs+xml内容ID:<request1@form-level.store>内容长度:。。。
<?xml version="1.0"?> <!-- the default grammar language is US English --> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" root="request"> <!-- single language attachment to tokens --> <rule id="yes"> <one-of> <item xml:lang="fr-CA">oui</item> <item xml:lang="en-US">yes</item> </one-of> </rule>
<?xml version="1.0"?> <!-- the default grammar language is US English --> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" root="request"> <!-- single language attachment to tokens --> <rule id="yes"> <one-of> <item xml:lang="fr-CA">oui</item> <item xml:lang="en-US">yes</item> </one-of> </rule>
<!-- single language attachment to a rule expansion --> <rule id="request"> may I speak to <one-of xml:lang="fr-CA"> <item>Michel Tremblay</item> <item>Andre Roy</item> </one-of> </rule> </grammar>
<!-- single language attachment to a rule expansion --> <rule id="request"> may I speak to <one-of xml:lang="fr-CA"> <item>Michel Tremblay</item> <item>Andre Roy</item> </one-of> </rule> </grammar>
S->C: MRCP/2.0 ... 543266 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speechrecog
S->C: MRCP/2.0 ... 543266 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speechrecog
S->C: MRCP/2.0 ... INTERPRETATION-COMPLETE 543266 200 COMPLETE Channel-Identifier:32AECB23433801@speechrecog Completion-Cause:000 success Content-Type:application/nlsml+xml Content-Length:...
S->C:MRCP/2.0。。。解释-完整543266 200完整通道标识符:32AECB23433801@speechrecog完成原因:000成功内容类型:应用程序/nlsml+xml内容长度:。。。
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:ex="http://www.example.com/example" grammar="session:request1@form-level.store"> <interpretation> <instance name="Person"> <ex:Person> <ex:Name> Andre Roy </ex:Name> </ex:Person> </instance> <input> may I speak to Andre Roy </input> </interpretation> </result>
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:ex="http://www.example.com/example" grammar="session:request1@form-level.store"> <interpretation> <instance name="Person"> <ex:Person> <ex:Name> Andre Roy </ex:Name> </ex:Person> </instance> <input> may I speak to Andre Roy </input> </interpretation> </result>
Digits received as DTMF tones are delivered to the recognition resource in the MRCPv2 server in the RTP stream according to RFC 4733 [RFC4733]. The Automatic Speech Recognizer (ASR) MUST support RFC 4733 to recognize digits, and it MAY support recognizing DTMF tones [Q.23] in the audio.
根据RFC 4733[RFC4733],作为DTMF音接收的数字以RTP流的形式传送到MRCPv2服务器中的识别资源。自动语音识别器(ASR)必须支持RFC 4733来识别数字,并且可能支持识别音频中的DTMF音调[Q.23]。
This resource captures received audio and video and stores it as content pointed to by a URI. The main usages of recorders are
此资源捕获接收到的音频和视频,并将其存储为URI指向的内容。记录器的主要用途是
1. to capture speech audio that may be submitted for recognition at a later time, and
1. 捕获可能在以后提交以供识别的语音音频,以及
2. recording voice or video mails.
2. 录制语音或视频邮件。
Both these applications require functionality above and beyond those specified by protocols such as RTSP [RFC2326]. This includes audio endpointing (i.e., detecting speech or silence). The support for video is OPTIONAL and is mainly capturing video mails that may require the speech or audio processing mentioned above.
这两种应用程序都需要超出RTSP[RFC2326]等协议规定的功能。这包括音频端点(即,检测语音或静音)。对视频的支持是可选的,主要是捕获可能需要上述语音或音频处理的视频邮件。
A recorder MUST provide endpointing capabilities for suppressing silence at the beginning and end of a recording, and it MAY also suppress silence in the middle of a recording. If such suppression is done, the recorder MUST maintain timing metadata to indicate the actual time stamps of the recorded media.
记录器必须提供在记录的开始和结束时抑制静默的端点能力,并且它还可以抑制记录中间的静默。如果进行了这种抑制,记录器必须维护定时元数据,以指示所记录介质的实际时间戳。
See the discussion on the sensitivity of saved waveforms in Section 12.
参见第12节中关于保存波形灵敏度的讨论。
Idle Recording State State | | |---------RECORD------->| | | |<------STOP------------| | | |<--RECORD-COMPLETE-----| | | | |--------| | START-OF-INPUT | | |------->| | | | |--------| | START-INPUT-TIMERS | | |------->| | |
Idle Recording State State | | |---------RECORD------->| | | |<------STOP------------| | | |<--RECORD-COMPLETE-----| | | | |--------| | START-OF-INPUT | | |------->| | | | |--------| | START-INPUT-TIMERS | | |------->| | |
Recorder State Machine
记录器状态机
The recorder resource supports the following methods.
记录器资源支持以下方法。
recorder-method = "RECORD" / "STOP" / "START-INPUT-TIMERS"
记录器方法=“记录”/“停止”/“启动输入定时器”
The recorder resource can generate the following events.
记录器资源可以生成以下事件。
recorder-event = "START-OF-INPUT" / "RECORD-COMPLETE"
记录器事件=“输入开始”/“记录完成”
Method invocations for the recorder resource can contain resource-specific header fields containing request options and information to augment the Method, Response, or Event message it is associated with.
记录器资源的方法调用可以包含特定于资源的头字段,其中包含请求选项和信息,以增强与之关联的方法、响应或事件消息。
recorder-header = sensitivity-level / no-input-timeout / completion-cause / completion-reason / failed-uri / failed-uri-cause / record-uri / media-type / max-time / trim-length / final-silence / capture-on-speech / ver-buffer-utterance / start-input-timers / new-audio-channel
recorder-header = sensitivity-level / no-input-timeout / completion-cause / completion-reason / failed-uri / failed-uri-cause / record-uri / media-type / max-time / trim-length / final-silence / capture-on-speech / ver-buffer-utterance / start-input-timers / new-audio-channel
To filter out background noise and not mistake it for speech, the recorder can support a variable level of sound sensitivity. The Sensitivity-Level header field is a float value between 0.0 and 1.0 and allows the client to set the sensitivity level for the recorder. This header field MAY occur in RECORD, SET-PARAMS, or GET-PARAMS. A higher value for this header field means higher sensitivity. The default value for this header field is implementation specific.
为了滤除背景噪音,避免误认为是语音,录音机可以支持不同级别的声音灵敏度。灵敏度级别标题字段是介于0.0和1.0之间的浮点值,允许客户端设置记录器的灵敏度级别。此标题字段可能出现在记录、SET-PARAMS或GET-PARAMS中。此标题字段的值越高,表示灵敏度越高。此标头字段的默认值是特定于实现的。
sensitivity-level = "Sensitivity-Level" ":" FLOAT CRLF
灵敏度级别=“灵敏度级别”:“浮动CRLF”
When recording is started and there is no speech detected for a certain period of time, the recorder can send a RECORD-COMPLETE event to the client and terminate the record operation. The No-Input-Timeout header field can set this timeout value. The value is in milliseconds. This header field MAY occur in RECORD, SET-PARAMS, or GET-PARAMS. The value for this header field ranges from 0 to an implementation-specific maximum value. The default value for this header field is implementation specific.
当开始录制且在一定时间段内未检测到语音时,录制器可以向客户端发送录制完成事件并终止录制操作。“无输入超时”标题字段可以设置此超时值。该值以毫秒为单位。此标题字段可能出现在记录、SET-PARAMS或GET-PARAMS中。此标头字段的值范围从0到特定于实现的最大值。此标头字段的默认值是特定于实现的。
no-input-timeout = "No-Input-Timeout" ":" 1*19DIGIT CRLF
no-input-timeout = "No-Input-Timeout" ":" 1*19DIGIT CRLF
This header field MUST be part of a RECORD-COMPLETE event from the recorder resource to the client. This indicates the reason behind the RECORD method completion. This header field MUST be sent in the RECORD responses if they return with a failure status and a COMPLETE state. In the ABNF below, the 'cause-code' contains a numerical value selected from the Cause-Code column of the following table. The 'cause-name' contains the corresponding token selected from the Cause-Name column.
此标头字段必须是从记录器资源到客户端的记录完成事件的一部分。这表示记录方法完成的原因。如果返回失败状态和完整状态,则必须在记录响应中发送此标题字段。在下面的ABNF中,“原因代码”包含从下表的“原因代码”列中选择的数值。“原因名称”包含从“原因名称”列中选择的相应标记。
completion-cause = "Completion-Cause" ":" cause-code SP cause-name CRLF cause-code = 3DIGIT cause-name = *VCHAR
completion-cause = "Completion-Cause" ":" cause-code SP cause-name CRLF cause-code = 3DIGIT cause-name = *VCHAR
+------------+-----------------------+------------------------------+ | Cause-Code | Cause-Name | Description | +------------+-----------------------+------------------------------+ | 000 | success-silence | RECORD completed with a | | | | silence at the end. | | 001 | success-maxtime | RECORD completed after | | | | reaching maximum recording | | | | time specified in record | | | | method. | | 002 | no-input-timeout | RECORD failed due to no | | | | input. | | 003 | uri-failure | Failure accessing the record | | | | URI. | | 004 | error | RECORD request terminated | | | | prematurely due to a | | | | recorder error. | +------------+-----------------------+------------------------------+
+------------+-----------------------+------------------------------+ | Cause-Code | Cause-Name | Description | +------------+-----------------------+------------------------------+ | 000 | success-silence | RECORD completed with a | | | | silence at the end. | | 001 | success-maxtime | RECORD completed after | | | | reaching maximum recording | | | | time specified in record | | | | method. | | 002 | no-input-timeout | RECORD failed due to no | | | | input. | | 003 | uri-failure | Failure accessing the record | | | | URI. | | 004 | error | RECORD request terminated | | | | prematurely due to a | | | | recorder error. | +------------+-----------------------+------------------------------+
This header field MAY be present in a RECORD-COMPLETE event coming from the recorder resource to the client. It contains the reason text behind the RECORD request completion. This header field communicates text describing the reason for the failure.
此标头字段可能出现在从记录器资源到客户端的记录完成事件中。它包含记录请求完成后的原因文本。此标题字段传递描述故障原因的文本。
The completion reason text is provided for client use in logs and for debugging and instrumentation purposes. Clients MUST NOT interpret the completion reason text.
完成原因文本用于日志中的客户端使用以及调试和检测目的。客户不得解释完成原因文本。
completion-reason = "Completion-Reason" ":" quoted-string CRLF
完成原因=“完成原因”“:”带引号的字符串CRLF
When a recorder method needs to post the audio to a URI and access to the URI fails, the server MUST provide the failed URI in this header field in the method response.
当录像机方法需要将音频发布到URI并且对URI的访问失败时,服务器必须在方法响应的这个头字段中提供失败的URI。
failed-uri = "Failed-URI" ":" absoluteURI CRLF
failed uri=“failed uri”“:”绝对uri CRLF
When a recorder method needs to post the audio to a URI and access to the URI fails, the server MAY provide the URI-specific or protocol-specific response code through this header field in the method response. The value encoding is UTF-8 (RFC 3629 [RFC3629]) to accommodate any access protocol -- some access protocols might have a response string instead of a numeric response code.
当记录器方法需要将音频发布到URI并且对URI的访问失败时,服务器可以通过方法响应中的该报头字段提供特定于URI或特定于协议的响应代码。值编码为UTF-8(RFC 3629[RFC3629]),以适应任何访问协议——某些访问协议可能具有响应字符串而不是数字响应代码。
failed-uri-cause = "Failed-URI-Cause" ":" 1*UTFCHAR CRLF
失败的uri原因=“失败的uri原因”“:“1*UTFCHAR CRLF
When a recorder method contains this header field, the server MUST capture the audio and store it. If the header field is present but specified with no value, the server MUST store the content locally and generate a URI that points to it. This URI is then returned in either the STOP response or the RECORD-COMPLETE event. If the header field in the RECORD method specifies a URI, the server MUST attempt to capture and store the audio at that location. If this header field is not specified in the RECORD request, the server MUST capture the audio, MUST encode it, and MUST send it in the STOP response or the RECORD-COMPLETE event as a message body. In this case, the
当录像机方法包含此标头字段时,服务器必须捕获音频并将其存储。如果header字段存在但没有指定值,则服务器必须在本地存储内容并生成指向该内容的URI。然后在停止响应或记录完成事件中返回此URI。如果RECORD方法中的header字段指定了URI,则服务器必须尝试在该位置捕获并存储音频。如果记录请求中未指定此标头字段,则服务器必须捕获音频,对其进行编码,并且必须在停止响应或记录完成事件中将其作为消息体发送。在这种情况下
response carrying the audio content MUST include a Content ID (cid) [RFC2392] value in this header pointing to the Content-ID in the message body.
携带音频内容的响应必须在此标头中包含指向消息正文中内容ID的内容ID(cid)[RFC2392]值。
The server MUST also return the size in octets and the duration in milliseconds of the recorded audio waveform as parameters associated with the header field.
服务器还必须返回录制音频波形的大小(以八位字节为单位)和持续时间(以毫秒为单位),作为与标头字段关联的参数。
Implementations MUST support 'http' [RFC2616], 'https' [RFC2818], 'file' [RFC3986], and 'cid' [RFC2392] schemes in the URI. Note that implementations already exist that support other schemes.
实现必须支持URI中的“http”[RFC2616]、“https”[RFC2818]、“文件”[RFC3986]和“cid”[RFC2392]方案。请注意,支持其他方案的实现已经存在。
record-uri = "Record-URI" ":" ["<" uri ">" ";" "size" "=" 1*19DIGIT ";" "duration" "=" 1*19DIGIT] CRLF
record-uri = "Record-URI" ":" ["<" uri ">" ";" "size" "=" 1*19DIGIT ";" "duration" "=" 1*19DIGIT] CRLF
A RECORD method MUST contain this header field, which specifies to the server the media type of the captured audio or video.
录制方法必须包含此标头字段,该字段向服务器指定捕获的音频或视频的媒体类型。
media-type = "Media-Type" ":" media-type-value CRLF
媒体类型=“媒体类型”“:”媒体类型值CRLF
When recording is started, this specifies the maximum length of the recording in milliseconds, calculated from the time the actual capture and store begins and is not necessarily the time the RECORD method is received. It specifies the duration before silence suppression, if any, has been applied by the recorder resource. After this time, the recording stops and the server MUST return a RECORD-COMPLETE event to the client having a request-state of COMPLETE. This header field MAY occur in RECORD, SET-PARAMS, or GET-PARAMS. The value for this header field ranges from 0 to an implementation-specific maximum value. A value of 0 means infinity, and hence the recording continues until one or more of the other stop conditions are met. The default value for this header field is 0.
开始录制时,这指定录制的最大长度(以毫秒为单位),从实际捕获和存储开始的时间开始计算,不一定是接收录制方法的时间。它指定记录器资源应用静默抑制(如果有)之前的持续时间。在此时间之后,录制停止,服务器必须向请求状态为COMPLETE的客户端返回RECORD-COMPLETE事件。此标题字段可能出现在记录、SET-PARAMS或GET-PARAMS中。此标头字段的值范围从0到特定于实现的最大值。值为0表示无穷大,因此记录将继续,直到满足一个或多个其他停止条件。此标题字段的默认值为0。
max-time = "Max-Time" ":" 1*19DIGIT CRLF
max-time = "Max-Time" ":" 1*19DIGIT CRLF
This header field MAY be sent on a STOP method and specifies the length of audio to be trimmed from the end of the recording after the stop. The length is interpreted to be in milliseconds. The default value for this header field is 0.
此标题字段可通过停止方法发送,并指定停止后从录制结束开始修剪的音频长度。长度被解释为以毫秒为单位。此标题字段的默认值为0。
trim-length = "Trim-Length" ":" 1*19DIGIT CRLF
trim-length = "Trim-Length" ":" 1*19DIGIT CRLF
When the recorder is started and the actual capture begins, this header field specifies the length of silence in the audio that is to be interpreted as the end of the recording. This header field MAY occur in RECORD, SET-PARAMS, or GET-PARAMS. The value for this header field ranges from 0 to an implementation-specific maximum value and is interpreted to be in milliseconds. A value of 0 means infinity, and hence the recording will continue until one of the other stop conditions are met. The default value for this header field is implementation specific.
当录音机启动且实际捕获开始时,此标题字段指定音频中的静音长度,该长度将被解释为录音结束。此标题字段可能出现在记录、SET-PARAMS或GET-PARAMS中。此标头字段的值范围从0到特定于实现的最大值,并被解释为以毫秒为单位。值为0表示无穷大,因此记录将继续,直到满足其他停止条件之一。此标头字段的默认值是特定于实现的。
final-silence = "Final-Silence" ":" 1*19DIGIT CRLF
final-silence = "Final-Silence" ":" 1*19DIGIT CRLF
If "false", the recorder MUST start capturing immediately when started. If "true", the recorder MUST wait for the endpointing functionality to detect speech before it starts capturing. This header field MAY occur in the RECORD, SET-PARAMS, or GET-PARAMS. The value for this header field is a Boolean. The default value for this header field is "false".
如果为“false”,则记录器必须在启动时立即开始捕获。如果为“true”,记录器必须等待端点功能检测语音,然后才能开始捕获。此标题字段可能出现在记录、SET-PARAMS或GET-PARAMS中。此标题字段的值为布尔值。此标题字段的默认值为“false”。
capture-on-speech = "Capture-On-Speech " ":" BOOLEAN CRLF
语音捕获=“语音捕获”:“布尔CRLF”
This header field is the same as the one described for the verifier resource (see Section 11.4.14). This tells the server to buffer the utterance associated with this recording request into the verification buffer. Sending this header field is permitted only if the verification buffer is for the session. This buffer is shared across resources within a session. It gets instantiated when a verifier resource is added to this session and is released when the verifier resource is released from the session.
该标题字段与为验证器资源描述的标题字段相同(见第11.4.14节)。这会告诉服务器将与此录制请求相关的话语缓冲到验证缓冲区中。仅当验证缓冲区用于会话时,才允许发送此标头字段。此缓冲区在会话内的资源之间共享。它在将验证器资源添加到此会话时实例化,并在从会话释放验证器资源时释放。
This header field MAY be sent as part of the RECORD request. A value of "false" tells the recorder resource to start the operation, but not to start the no-input timer until the client sends a START-INPUT-TIMERS request to the recorder resource. This is useful in the scenario when the recorder and synthesizer resources are not part of the same session. When a kill-on-barge-in prompt is being played, the client may want the RECORD request to be simultaneously active so that it can detect and implement kill-on-barge-in (see Section 8.4.2). But at the same time, the client doesn't want the recorder resource to start the no-input timers until the prompt is finished. The default value is "true".
此标题字段可以作为记录请求的一部分发送。值“false”告诉记录器资源启动操作,但在客户端向记录器资源发送启动输入计时器请求之前,不要启动无输入计时器。这在记录器和合成器资源不属于同一会话的情况下非常有用。播放驳船压井提示时,客户可能希望记录请求同时激活,以便能够检测并实施驳船压井(见第8.4.2节)。但同时,客户端不希望记录器资源在提示完成之前启动无输入计时器。默认值为“true”。
start-input-timers = "Start-Input-Timers" ":" BOOLEAN CRLF
启动输入计时器=“启动输入计时器”“:”布尔CRLF
This header field is the same as the one described for the recognizer resource (see Section 9.4.23).
该标题字段与识别器资源的标题字段相同(见第9.4.23节)。
If the RECORD request did not have a Record-URI header field, the STOP response or the RECORD-COMPLETE event MUST contain a message body carrying the captured audio. In this case, the message carrying the audio content has a Record-URI header field with a Content ID value pointing to the message body entity that contains the recorded audio. See Section 10.4.7 for details.
如果记录请求没有记录URI头字段,则停止响应或记录完成事件必须包含一个携带捕获音频的消息体。在这种情况下,携带音频内容的消息具有记录URI报头字段,其内容ID值指向包含记录音频的消息主体实体。详见第10.4.7节。
The RECORD request places the recorder resource in the recording state. Depending on the header fields specified in the RECORD method, the resource may start recording the audio immediately or wait for the endpointing functionality to detect speech in the audio. The audio is then made available to the client either in the message body or as specified by Record-URI.
记录请求将记录器资源置于记录状态。根据记录方法中指定的头字段,资源可以立即开始记录音频,或者等待端点功能检测音频中的语音。然后,音频在消息体中或按照记录URI的指定提供给客户端。
The server MUST support the 'https' URI scheme and MAY support other schemes. Note that, due to the sensitive nature of voice recordings, any protocols used for dereferencing SHOULD employ integrity and confidentiality, unless other means, such as use of a controlled environment (see Section 4.2), are employed.
服务器必须支持“https”URI方案,并且可能支持其他方案。注意,由于语音记录的敏感性,用于解引用的任何协议都应采用完整性和保密性,除非采用其他方式,如使用受控环境(见第4.2节)。
If a RECORD operation is already in progress, invoking this method causes the server to issue a response having a status-code of 402 "Method not valid in this state" and a request-state of COMPLETE.
如果记录操作已经在进行中,调用此方法将导致服务器发出状态代码为402“method not valid in this state”(方法在此状态下无效)和请求状态为COMPLETE(完成)的响应。
If the Record-URI is not valid, a status-code of 404 "Illegal Value for Header Field" is returned in the response. If it is impossible for the server to create the requested stored content, a status-code of 407 "Method or Operation Failed" is returned.
如果记录URI无效,响应中将返回404“头字段的非法值”状态代码。如果服务器无法创建请求的存储内容,则返回状态代码407“方法或操作失败”。
If the type specified in the Media-Type header field is not supported, the server MUST respond with a status-code of 409 "Unsupported Header Field Value" with the Media-Type header field in its response.
如果不支持媒体类型标头字段中指定的类型,服务器必须在其响应中使用媒体类型标头字段以409“Unsupported header field Value”的状态代码进行响应。
When the recording operation is initiated, the response indicates an IN-PROGRESS request state. The server MAY generate a subsequent START-OF-INPUT event when speech is detected. Upon completion of the recording operation, the server generates a RECORD-COMPLETE event.
启动记录操作时,响应指示正在进行的请求状态。当检测到语音时,服务器可生成随后的输入启动事件。记录操作完成后,服务器将生成记录完成事件。
C->S: MRCP/2.0 ... RECORD 543257 Channel-Identifier:32AECB23433802@recorder Record-URI:<file://mediaserver/recordings/myfile.wav> Media-Type:audio/wav Capture-On-Speech:true Final-Silence:300 Max-Time:6000
C->S: MRCP/2.0 ... RECORD 543257 Channel-Identifier:32AECB23433802@recorder Record-URI:<file://mediaserver/recordings/myfile.wav> Media-Type:audio/wav Capture-On-Speech:true Final-Silence:300 Max-Time:6000
S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS Channel-Identifier:32AECB23433802@recorder
S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS Channel-Identifier:32AECB23433802@recorder
S->C: MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS Channel-Identifier:32AECB23433802@recorder
S->C: MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS Channel-Identifier:32AECB23433802@recorder
S->C: MRCP/2.0 ... RECORD-COMPLETE 543257 COMPLETE Channel-Identifier:32AECB23433802@recorder Completion-Cause:000 success-silence Record-URI:<file://mediaserver/recordings/myfile.wav>; size=242552;duration=25645
S->C: MRCP/2.0 ... RECORD-COMPLETE 543257 COMPLETE Channel-Identifier:32AECB23433802@recorder Completion-Cause:000 success-silence Record-URI:<file://mediaserver/recordings/myfile.wav>; size=242552;duration=25645
RECORD Example
记录示例
The STOP method moves the recorder from the recording state back to the idle state. If a RECORD request is active and the STOP request successfully terminates it, then the STOP response MUST contain an Active-Request-Id-List header field containing the RECORD request-id that was terminated. In this case, no RECORD-COMPLETE event is sent
停止方法将记录器从记录状态移回空闲状态。如果记录请求处于活动状态且停止请求成功终止,则停止响应必须包含活动请求Id列表标题字段,其中包含已终止的记录请求Id。在这种情况下,不发送记录完整事件
for the terminated request. If there was no recording active, then the response MUST NOT contain an Active-Request-Id-List header field. If the recording was a success, the STOP response MUST contain a Record-URI header field pointing to the recorded audio content or to a typed entity in the body of the STOP response containing the recorded audio. The STOP method MAY have a Trim-Length header field, in which case the specified length of audio is trimmed from the end of the recording after the stop. In any case, the response MUST contain a status-code of 200 "Success".
对于终止的请求。如果没有活动录制,则响应不得包含活动请求Id列表标题字段。如果录制成功,停止响应必须包含一个Record URI头字段,该字段指向录制的音频内容或包含录制音频的停止响应主体中的键入实体。STOP方法可能有一个Trim Length头字段,在这种情况下,指定的音频长度在停止后从录制结束时开始修剪。在任何情况下,响应必须包含状态代码200“成功”。
C->S: MRCP/2.0 ... RECORD 543257 Channel-Identifier:32AECB23433802@recorder Record-URI:<file://mediaserver/recordings/myfile.wav> Capture-On-Speech:true Final-Silence:300 Max-Time:6000
C->S: MRCP/2.0 ... RECORD 543257 Channel-Identifier:32AECB23433802@recorder Record-URI:<file://mediaserver/recordings/myfile.wav> Capture-On-Speech:true Final-Silence:300 Max-Time:6000
S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS Channel-Identifier:32AECB23433802@recorder
S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS Channel-Identifier:32AECB23433802@recorder
S->C: MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS Channel-Identifier:32AECB23433802@recorder
S->C: MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS Channel-Identifier:32AECB23433802@recorder
C->S: MRCP/2.0 ... STOP 543257 Channel-Identifier:32AECB23433802@recorder Trim-Length:200
C->S: MRCP/2.0 ... STOP 543257 Channel-Identifier:32AECB23433802@recorder Trim-Length:200
S->C: MRCP/2.0 ... 543257 200 COMPLETE Channel-Identifier:32AECB23433802@recorder Record-URI:<file://mediaserver/recordings/myfile.wav>; size=324253;duration=24561 Active-Request-Id-List:543257
S->C: MRCP/2.0 ... 543257 200 COMPLETE Channel-Identifier:32AECB23433802@recorder Record-URI:<file://mediaserver/recordings/myfile.wav>; size=324253;duration=24561 Active-Request-Id-List:543257
STOP Example
停止举例
If the recording completes due to no input, silence after speech, or reaching the max-time, the server MUST generate the RECORD-COMPLETE event to the client with a request-state of COMPLETE. If the recording was a success, the RECORD-COMPLETE event contains a Record-URI header field pointing to the recorded audio file on the server or to a typed entity in the message body containing the recorded audio.
如果由于没有输入、语音后静音或达到最大时间而导致录制完成,则服务器必须向客户端生成请求状态为COMPLETE的RECORD-COMPLETE事件。如果录制成功,RECORD-COMPLETE事件将包含一个RECORD URI头字段,该字段指向服务器上录制的音频文件或包含录制音频的消息正文中的键入实体。
C->S: MRCP/2.0 ... RECORD 543257 Channel-Identifier:32AECB23433802@recorder Record-URI:<file://mediaserver/recordings/myfile.wav> Capture-On-Speech:true Final-Silence:300 Max-Time:6000
C->S: MRCP/2.0 ... RECORD 543257 Channel-Identifier:32AECB23433802@recorder Record-URI:<file://mediaserver/recordings/myfile.wav> Capture-On-Speech:true Final-Silence:300 Max-Time:6000
S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS Channel-Identifier:32AECB23433802@recorder
S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS Channel-Identifier:32AECB23433802@recorder
S->C: MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS Channel-Identifier:32AECB23433802@recorder
S->C: MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS Channel-Identifier:32AECB23433802@recorder
S->C: MRCP/2.0 ... RECORD-COMPLETE 543257 COMPLETE Channel-Identifier:32AECB23433802@recorder Completion-Cause:000 success Record-URI:<file://mediaserver/recordings/myfile.wav>; size=325325;duration=24652
S->C: MRCP/2.0 ... RECORD-COMPLETE 543257 COMPLETE Channel-Identifier:32AECB23433802@recorder Completion-Cause:000 success Record-URI:<file://mediaserver/recordings/myfile.wav>; size=325325;duration=24652
RECORD-COMPLETE Example
记录完整的示例
This request is sent from the client to the recorder resource when it discovers that a kill-on-barge-in prompt has finished playing (see Section 8.4.2). This is useful in the scenario when the recorder and synthesizer resources are not in the same MRCPv2 session. When a kill-on-barge-in prompt is being played, the client wants the RECORD request to be simultaneously active so that it can detect and implement kill-on-barge-in. But at the same time, the client doesn't want the recorder resource to start the no-input timers until the prompt is finished. The Start-Input-Timers header field in the RECORD request allows the client to say if the timers should be started or not. In the above case, the recorder resource does not start the timers until the client sends a START-INPUT-TIMERS method to the recorder.
当客户端发现驳船上的压井提示已完成播放时,该请求从客户端发送到记录器资源(参见第8.4.2节)。这在记录器和合成器资源不在同一MRCPv2会话中的场景中非常有用。播放驳船压井提示时,客户机希望记录请求同时处于活动状态,以便能够检测并实施驳船压井。但同时,客户端不希望记录器资源在提示完成之前启动无输入计时器。记录请求中的Start Input Timers header字段允许客户机说出是否应该启动计时器。在上述情况下,在客户端向记录器发送start-INPUT-timers方法之前,记录器资源不会启动计时器。
The START-OF-INPUT event is returned from the server to the client once the server has detected speech. This event is always returned by the recorder resource when speech has been detected. The recorder resource also MUST send a Proxy-Sync-Id header field with a unique value for this event.
一旦服务器检测到语音,输入开始事件将从服务器返回到客户端。当检测到语音时,记录器资源始终返回此事件。记录器资源还必须为此事件发送具有唯一值的代理同步Id标头字段。
S->C: MRCP/2.0 ... START-OF-INPUT 543259 IN-PROGRESS Channel-Identifier:32AECB23433801@recorder Proxy-Sync-Id:987654321
S->C: MRCP/2.0 ... START-OF-INPUT 543259 IN-PROGRESS Channel-Identifier:32AECB23433801@recorder Proxy-Sync-Id:987654321
This section describes the methods, responses and events employed by MRCPv2 for doing speaker verification/identification.
本节介绍MRCPv2用于进行说话人验证/识别的方法、响应和事件。
Speaker verification is a voice authentication methodology that can be used to identify the speaker in order to grant the user access to sensitive information and transactions. Because speech is a biometric, a number of essential security considerations related to biometric authentication technologies apply to its implementation and usage. Implementers should carefully read Section 12 in this document and the corresponding section of the SPEECHSC requirements [RFC4313]. Implementers and deployers of this technology are strongly encouraged to check the state of the art for any new risks and solutions that might have been developed.
说话人验证是一种语音验证方法,可用于识别说话人,以便允许用户访问敏感信息和事务。由于语音是一种生物识别技术,因此与生物识别认证技术相关的一些基本安全注意事项适用于语音的实现和使用。实施者应仔细阅读本文件第12节和演讲要求[RFC4313]的相应章节。强烈鼓励此技术的实施者和部署者检查最新技术状态,以了解可能已开发的任何新风险和解决方案。
In speaker verification, a recorded utterance is compared to a previously stored voiceprint, which is in turn associated with a claimed identity for that user. Verification typically consists of two phases: a designation phase to establish the claimed identity of the caller and an execution phase in which a voiceprint is either created (training) or used to authenticate the claimed identity (verification).
在说话人验证中,将记录的话语与先前存储的声纹进行比较,而声纹又与该用户声称的身份相关联。验证通常包括两个阶段:一个是建立呼叫者声明身份的指定阶段,另一个是创建声纹(培训)或用于验证声明身份的执行阶段(验证)。
Speaker identification is the process of associating an unknown speaker with a member in a population. It does not employ a claim of identity. When an individual claims to belong to a group (e.g., one of the owners of a joint bank account) a group authentication is performed. This is generally implemented as a kind of verification involving comparison with more than one voice model. It is sometimes called 'multi-verification'. If the individual speaker can be identified from the group, this may be useful for applications where multiple users share the same access privileges to some data or application. Speaker identification and group authentication are also done in two phases, a designation phase and an execution phase. Note that, from a functionality standpoint, identification can be thought of as a special case of group authentication (if the individual is identified) where the group is the entire population, although the implementation of speaker identification may be different from the way group authentication is performed. To accommodate single-voiceprint verification, verification against multiple voiceprints, group authentication, and identification, this specification provides a single set of methods that can take a list of identifiers, called "voiceprint identifiers", and return a list of identifiers, with a score for each that represents how well the input speech matched each identifier. The input and output lists of identifiers do not have to match, allowing a vendor-specific group identifier to be used as input to indicate that identification is to
说话人识别是将未知说话人与群体中的成员关联起来的过程。它不使用身份声明。当个人声称属于某个集团(例如,联合银行账户的所有者之一)时,将执行集团身份验证。这通常被实现为一种验证,包括与多个语音模型进行比较。它有时被称为“多重验证”。如果可以从组中识别出单个说话人,这对于多个用户共享对某些数据或应用程序的相同访问权限的应用程序可能很有用。说话人识别和组认证也分为两个阶段,指定阶段和执行阶段。注意,从功能性的角度来看,识别可以被认为是组认证(如果识别了个人)的特殊情况,其中组是整个群体,尽管说话人识别的实现可能不同于执行组认证的方式。为了适应单个声纹验证、针对多个声纹的验证、组身份验证和标识,本规范提供了一组方法,这些方法可以获取标识符列表,称为“声纹标识符”,并返回标识符列表,每个都有一个分数,表示输入语音与每个标识符的匹配程度。标识符的输入和输出列表不必匹配,允许使用特定于供应商的组标识符作为输入,以指示要进行标识
be performed. In this specification, the terms "identification" and "multi-verification" are used to indicate that the input represents a group (potentially the entire population) and that results for multiple voiceprints may be returned.
将被执行。在本规范中,术语“识别”和“多重验证”用于表示输入代表一个组(可能是整个群体),并且可以返回多个声纹的结果。
It is possible for a verifier resource to share the same session with a recognizer resource or to operate independently. In order to share the same session, the verifier and recognizer resources MUST be allocated from within the same SIP dialog. Otherwise, an independent verifier resource, running on the same physical server or a separate one, will be set up. Note that, in addition to allowing both resources to be allocated in the same INVITE, it is possible to allocate one initially and the other later via a re-INVITE.
验证器资源可以与识别器资源共享同一会话或独立操作。为了共享同一会话,必须从同一SIP对话框中分配验证器和识别器资源。否则,将设置在同一物理服务器或单独的物理服务器上运行的独立验证器资源。请注意,除了允许在同一邀请中分配两个资源外,还可以通过重新邀请先分配一个资源,然后再分配另一个资源。
Some of the speaker verification methods, described below, apply only to a specific mode of operation.
下面介绍的一些扬声器验证方法仅适用于特定的操作模式。
The verifier resource has a verification buffer associated with it (see Section 11.4.14). This allows the storage of speech utterances for the purposes of verification, identification, or training from the buffered speech. This buffer is owned by the verifier resource, but other input resources (such as the recognizer resource or recorder resource) may write to it. This allows the speech received as part of a recognition or recording operation to be later used for verification, identification, or training. Access to the buffer is limited to one operation at time. Hence, when the resource is doing read, write, or delete operations, such as a RECOGNIZE with ver-buffer-utterance turned on, another operation involving the buffer fails with a status-code of 402. The verification buffer can be cleared by a CLEAR-BUFFER request from the client and is freed when the verifier resource is deallocated or the session with the server terminates.
验证器资源具有与其相关联的验证缓冲区(见第11.4.14节)。这允许存储语音,以便验证、识别或训练缓冲语音。此缓冲区由验证器资源拥有,但其他输入资源(如识别器资源或记录器资源)可能会写入。这使得作为识别或记录操作的一部分接收的语音可以稍后用于验证、识别或训练。对缓冲区的访问一次仅限于一个操作。因此,当资源正在执行读、写或删除操作时,例如在ver缓冲区话语打开的情况下进行识别,涉及缓冲区的另一操作失败,状态代码为402。验证缓冲区可以通过来自客户端的CLEAR-buffer请求来清除,并在释放验证器资源或与服务器的会话终止时释放。
The verification buffer is different from collecting waveforms and processing them using either the real-time audio stream or stored audio, because this buffering mechanism does not simply accumulate speech to a buffer. The verification buffer MAY contain additional information gathered by the recognizer resource that serves to improve verification performance.
验证缓冲区不同于收集波形并使用实时音频流或存储的音频对其进行处理,因为这种缓冲机制并不只是将语音累积到缓冲区。验证缓冲区可包含识别器资源收集的用于提高验证性能的附加信息。
Speaker verification may operate in a training or a verification session. Starting one of these sessions does not change the state of the verifier resource, i.e., it remains idle. Once a verification or training session is started, then utterances are trained or verified
说话人验证可以在培训或验证课程中进行。启动其中一个会话不会更改验证器资源的状态,即它保持空闲。一旦验证或培训课程开始,就要对话语进行培训或验证
by calling the VERIFY or VERIFY-FROM-BUFFER method. The state of the verifier resources goes from IDLE to VERIFYING state each time VERIFY or VERIFY-FROM-BUFFER is called.
通过调用VERIFY或VERIFY-FROM-BUFFER方法。每次调用VERIFY或VERIFY-from-BUFFER时,验证器资源的状态从空闲变为验证状态。
Idle Session Opened Verifying/Training State State State | | | |--START-SESSION--->| | | | | | |----------| | | | START-SESSION | | |<---------| | | | | |<--END-SESSION-----| | | | | | |---------VERIFY--------->| | | | | |---VERIFY-FROM-BUFFER--->| | | | | |----------| | | | VERIFY-ROLLBACK | | |<---------| | | | | | | |--------| | | GET-INTERMEDIATE-RESULT | | | |------->| | | | | | |--------| | | START-INPUT-TIMERS | | | |------->| | | | | | |--------| | | START-OF-INPUT | | | |------->| | | | | |<-VERIFICATION-COMPLETE--| | | | | |<--------STOP------------| | | | | |----------| | | | STOP | | |<---------| | | | | |----------| | | | STOP | | |<---------| | |
Idle Session Opened Verifying/Training State State State | | | |--START-SESSION--->| | | | | | |----------| | | | START-SESSION | | |<---------| | | | | |<--END-SESSION-----| | | | | | |---------VERIFY--------->| | | | | |---VERIFY-FROM-BUFFER--->| | | | | |----------| | | | VERIFY-ROLLBACK | | |<---------| | | | | | | |--------| | | GET-INTERMEDIATE-RESULT | | | |------->| | | | | | |--------| | | START-INPUT-TIMERS | | | |------->| | | | | | |--------| | | START-OF-INPUT | | | |------->| | | | | |<-VERIFICATION-COMPLETE--| | | | | |<--------STOP------------| | | | | |----------| | | | STOP | | |<---------| | | | | |----------| | | | STOP | | |<---------| | |
| |----------| | | | CLEAR-BUFFER | | |<---------| | | | | |----------| | | | CLEAR-BUFFER | | |<---------| | | | | | | |----------| | | | QUERY-VOICEPRINT | | |<---------| | | | | |----------| | | | QUERY-VOICEPRINT | | |<---------| | | | | | | |----------| | | | DELETE-VOICEPRINT | | |<---------| | | | | |----------| | | | DELETE-VOICEPRINT | | |<---------| | |
| |----------| | | | CLEAR-BUFFER | | |<---------| | | | | |----------| | | | CLEAR-BUFFER | | |<---------| | | | | | | |----------| | | | QUERY-VOICEPRINT | | |<---------| | | | | |----------| | | | QUERY-VOICEPRINT | | |<---------| | | | | | | |----------| | | | DELETE-VOICEPRINT | | |<---------| | | | | |----------| | | | DELETE-VOICEPRINT | | |<---------| | |
Verifier Resource State Machine
验证器资源状态机
The verifier resource supports the following methods.
验证器资源支持以下方法。
verifier-method = "START-SESSION" / "END-SESSION" / "QUERY-VOICEPRINT" / "DELETE-VOICEPRINT" / "VERIFY" / "VERIFY-FROM-BUFFER" / "VERIFY-ROLLBACK" / "STOP" / "CLEAR-BUFFER" / "START-INPUT-TIMERS" / "GET-INTERMEDIATE-RESULT"
verifier-method = "START-SESSION" / "END-SESSION" / "QUERY-VOICEPRINT" / "DELETE-VOICEPRINT" / "VERIFY" / "VERIFY-FROM-BUFFER" / "VERIFY-ROLLBACK" / "STOP" / "CLEAR-BUFFER" / "START-INPUT-TIMERS" / "GET-INTERMEDIATE-RESULT"
These methods allow the client to control the mode and target of verification or identification operations within the context of a session. All the verification input operations that occur within a session can be used to create, update, or validate against the
这些方法允许客户端在会话上下文中控制验证或标识操作的模式和目标。会话中发生的所有验证输入操作都可用于根据会话创建、更新或验证
voiceprint specified during the session. At the beginning of each session, the verifier resource is reset to the state it had prior to any previous verification session.
会话期间指定的声纹。在每个会话开始时,将验证器资源重置为其在任何先前验证会话之前的状态。
Verification/identification operations can be executed against live or buffered audio. The verifier resource provides methods for collecting and evaluating live audio data, and methods for controlling the verifier resource and adjusting its configured behavior.
可以对实时或缓冲音频执行验证/识别操作。验证器资源提供用于收集和评估实时音频数据的方法,以及用于控制验证器资源和调整其配置行为的方法。
There are no dedicated methods for collecting buffered audio data. This is accomplished by calling VERIFY, RECOGNIZE, or RECORD as appropriate for the resource, with the header field Ver-Buffer-Utterance. Then, when the following method is called, verification is performed using the set of buffered audio.
没有专门用于收集缓冲音频数据的方法。这是通过调用适合于资源的VERIFY、RECOGNIZE或RECORD来实现的,使用头字段Ver Buffer话语。然后,当调用以下方法时,使用缓冲音频集执行验证。
1. VERIFY-FROM-BUFFER
1. 从缓冲区验证
The following methods are used for verification of live audio utterances:
以下方法用于验证现场音频讲话:
1. VERIFY
1. 验证
2. START-INPUT-TIMERS
2. 启动输入定时器
The following methods are used for configuring the verifier resource and for establishing resource states:
以下方法用于配置验证器资源和建立资源状态:
1. START-SESSION
1. 启动会议
2. END-SESSION
2. 结束会议
3. QUERY-VOICEPRINT
3. 查询声纹
4. DELETE-VOICEPRINT
4. 删除声纹
5. VERIFY-ROLLBACK
5. 验证-回滚
6. STOP
6. 停止
7. CLEAR-BUFFER
7. 清除缓冲区
The following method allows the polling of a verification in progress for intermediate results.
以下方法允许轮询正在进行的验证以获得中间结果。
1. GET-INTERMEDIATE-RESULT
1. 结果
The verifier resource generates the following events.
验证器资源生成以下事件。
verifier-event = "VERIFICATION-COMPLETE" / "START-OF-INPUT"
验证人事件=“验证完成”/“输入开始”
A verifier resource message can contain header fields containing request options and information to augment the Request, Response, or Event message it is associated with.
验证器资源消息可以包含包含请求选项和信息的头字段,以增加与其关联的请求、响应或事件消息。
verification-header = repository-uri / voiceprint-identifier / verification-mode / adapt-model / abort-model / min-verification-score / num-min-verification-phrases / num-max-verification-phrases / no-input-timeout / save-waveform / media-type / waveform-uri / voiceprint-exists / ver-buffer-utterance / input-waveform-uri / completion-cause / completion-reason / speech-complete-timeout / new-audio-channel / abort-verification / start-input-timers
verification-header = repository-uri / voiceprint-identifier / verification-mode / adapt-model / abort-model / min-verification-score / num-min-verification-phrases / num-max-verification-phrases / no-input-timeout / save-waveform / media-type / waveform-uri / voiceprint-exists / ver-buffer-utterance / input-waveform-uri / completion-cause / completion-reason / speech-complete-timeout / new-audio-channel / abort-verification / start-input-timers
This header field specifies the voiceprint repository to be used or referenced during speaker verification or identification operations. This header field is required in the START-SESSION, QUERY-VOICEPRINT, and DELETE-VOICEPRINT methods.
此标题字段指定在说话人验证或识别操作期间要使用或引用的声纹存储库。此标题字段在START-SESSION、QUERY-VOICEPRINT和DELETE-VOICEPRINT方法中是必需的。
repository-uri = "Repository-URI" ":" uri CRLF
repository uri=“repository uri”“:”uri CRLF
This header field specifies the claimed identity for verification applications. The claimed identity MAY be used to specify an existing voiceprint or to establish a new voiceprint. This header field MUST be present in the QUERY-VOICEPRINT and DELETE-VOICEPRINT methods. The Voiceprint-Identifier MUST be present in the START-SESSION method for verification operations. For identification or multi-verification operations, this header field MAY contain a list of voiceprint identifiers separated by semicolons. For identification operations, the client MAY also specify a voiceprint group identifier instead of a list of voiceprint identifiers.
此标头字段指定验证应用程序的声明标识。所声称的身份可用于指定现有声纹或建立新声纹。此标题字段必须出现在QUERY-VOICEPRINT和DELETE-VOICEPRINT方法中。声纹标识符必须存在于验证操作的启动会话方法中。对于标识或多重验证操作,此标题字段可能包含由分号分隔的声纹标识符列表。对于识别操作,客户端还可以指定声纹组标识符而不是声纹标识符的列表。
voiceprint-identifier = "Voiceprint-Identifier" ":" vid *[";" vid] CRLF vid = 1*VCHAR ["." 1*VCHAR]
voiceprint-identifier = "Voiceprint-Identifier" ":" vid *[";" vid] CRLF vid = 1*VCHAR ["." 1*VCHAR]
This header field specifies the mode of the verifier resource and is set by the START-SESSION method. Acceptable values indicate whether the verification session will train a voiceprint ("train") or verify/ identify using an existing voiceprint ("verify").
此标头字段指定验证器资源的模式,并由START-SESSION方法设置。可接受值指示验证会话是训练声纹(“训练”)还是使用现有声纹(“验证”)进行验证/识别。
Training and verification sessions both require the voiceprint Repository-URI to be specified in the START-SESSION. In many usage scenarios, however, the system does not know the speaker's claimed identity until a recognition operation has, for example, recognized an account number to which the user desires access. In order to allow the first few utterances of a dialog to be both recognized and verified, the verifier resource on the MRCPv2 server retains a buffer. In this buffer, the MRCPv2 server accumulates recognized utterances. The client can later execute a verification method and apply the buffered utterances to the current verification session.
培训和验证会话都要求在启动会话中指定声纹存储库URI。然而,在许多使用场景中,直到识别操作(例如)识别出用户想要访问的帐号,系统才知道说话人的声称身份。为了允许识别和验证对话框的前几句话,MRCPv2服务器上的验证器资源保留了一个缓冲区。在该缓冲区中,MRCPv2服务器累积已识别的话语。客户机稍后可以执行验证方法,并将缓冲语句应用于当前验证会话。
Some voice user interfaces may require additional user input that should not be subject to verification. For example, the user's input may have been recognized with low confidence and thus require a confirmation cycle. In such cases, the client SHOULD NOT execute the VERIFY or VERIFY-FROM-BUFFER methods to collect and analyze the caller's input. A separate recognizer resource can analyze the caller's response without any participation by the verifier resource.
某些语音用户界面可能需要额外的用户输入,不应进行验证。例如,用户的输入可能已被低置信度识别,因此需要一个确认周期。在这种情况下,客户端不应该执行VERIFY或VERIFY-FROM-BUFFER方法来收集和分析调用方的输入。单独的识别器资源可以分析调用方的响应,而不需要验证器资源的任何参与。
Once the following conditions have been met:
一旦满足以下条件:
1. the voiceprint identity has been successfully established through the Voiceprint-Identifier header fields of the START-SESSION method, and
1. 已通过启动会话方法的声纹标识符标头字段成功建立声纹标识,以及
2. the verification mode has been set to one of "train" or "verify",
2. 验证模式已设置为“列车”或“验证”模式之一,
the verifier resource can begin providing verification information during verification operations. If the verifier resource does not reach one of the two major states ("train" or "verify") , it MUST report an error condition in the MRCPv2 status code to indicate why the verifier resource is not ready for the corresponding usage.
验证器资源可以在验证操作期间开始提供验证信息。如果验证器资源未达到两种主要状态之一(“训练”或“验证”),则必须在MRCPv2状态代码中报告错误情况,以说明验证器资源未准备好进行相应使用的原因。
The value of verification-mode is persistent within a verification session. If the client attempts to change the mode during a verification session, the verifier resource reports an error and the mode retains its current value.
验证模式的值在验证会话中是持久的。如果客户端在验证会话期间尝试更改模式,则验证程序资源将报告错误,并且模式将保留其当前值。
verification-mode = "Verification-Mode" ":" verification-mode-string
验证模式=“验证模式”“:”验证模式字符串
verification-mode-string = "train" / "verify"
验证模式字符串=“列车”/“验证”
This header field indicates the desired behavior of the verifier resource after a successful verification operation. If the value of this header field is "true", the server SHOULD use audio collected during the verification session to update the voiceprint to account for ongoing changes in a speaker's incoming speech characteristics, unless local policy prohibits updating the voiceprint. If the value is "false" (the default), the server MUST NOT update the voiceprint. This header field MAY occur in the START-SESSION method.
此标头字段指示验证程序资源在成功验证操作后的所需行为。如果此标题字段的值为“true”,则服务器应使用验证会话期间收集的音频更新声纹,以说明演讲者传入语音特征的持续变化,除非当地政策禁止更新声纹。如果该值为“false”(默认值),则服务器不得更新声纹。此标头字段可能出现在启动会话方法中。
adapt-model = "Adapt-Model" ":" BOOLEAN CRLF
adapt model=“adapt model”“:”布尔型CRLF
The Abort-Model header field indicates the desired behavior of the verifier resource upon session termination. If the value of this header field is "true", the server MUST discard any pending changes to a voiceprint due to verification training or verification adaptation. If the value is "false" (the default), the server MUST commit any pending changes for a training session or a successful
Abort Model header字段指示会话终止时验证器资源的所需行为。如果此标题字段的值为“true”,则服务器必须放弃由于验证培训或验证自适应而对声纹进行的任何挂起的更改。如果该值为“false”(默认值),则服务器必须提交任何挂起的更改,以进行培训会话或成功的测试
verification session to the voiceprint repository. A value of "true" for Abort-Model overrides a value of "true" for the Adapt-Model header field. This header field MAY occur in the END-SESSION method.
验证会话到声纹存储库。中止模型的值为“真”将覆盖自适应模型标题字段的值为“真”。此标头字段可能出现在结束会话方法中。
abort-model = "Abort-Model" ":" BOOLEAN CRLF
abort model=“abort model”“:”布尔型CRLF
The Min-Verification-Score header field, when used with a verifier resource through a SET-PARAMS, GET-PARAMS, or START-SESSION method, determines the minimum verification score for which a verification decision of "accepted" may be declared by the server. This is a float value between -1.0 and 1.0. The default value for this header field is implementation specific.
当通过SET-PARAMS、GET-PARAMS或START-SESSION方法与验证器资源一起使用时,Min Verification Score标头字段确定服务器可以声明“已接受”的验证决定的最小验证分数。这是一个介于-1.0和1.0之间的浮点值。此标头字段的默认值是特定于实现的。
min-verification-score = "Min-Verification-Score" ":" [ %x2D ] FLOAT CRLF
最小验证分数=“最小验证分数”:“[%x2D]浮动CRLF
The Num-Min-Verification-Phrases header field is used to specify the minimum number of valid utterances before a positive decision is given for verification. The value for this header field is an integer and the default value is 1. The verifier resource MUST NOT declare a verification 'accepted' unless Num-Min-Verification-Phrases valid utterances have been received. The minimum value is 1. This header field MAY occur in START-SESSION, SET-PARAMS, or GET-PARAMS.
Num Min Verification Phrases头字段用于指定在给出验证的肯定决策之前有效话语的最小数量。此标头字段的值为整数,默认值为1。除非收到Num-Min验证短语有效语句,否则验证程序资源不得声明验证“已接受”。最小值为1。此标头字段可能出现在START-SESSION、SET-PARAMS或GET-PARAMS中。
num-min-verification-phrases = "Num-Min-Verification-Phrases" ":" 1*19DIGIT CRLF
num-min验证短语=“num-min验证短语”“:“1*19位CRLF
The Num-Max-Verification-Phrases header field is used to specify the number of valid utterances required before a decision is forced for verification. The verifier resource MUST NOT return a decision of 'undecided' once Num-Max-Verification-Phrases have been collected and used to determine a verification score. The value for this header field is an integer and the minimum value is 1. The default value is implementation specific. This header field MAY occur in START-SESSION, SET-PARAMS, or GET-PARAMS.
Num Max Verification Phrases头字段用于指定强制验证决策之前所需的有效话语数。一旦收集了Num-Max验证短语并用于确定验证分数,验证程序资源不得返回“未决定”的决定。此标头字段的值为整数,最小值为1。默认值是特定于实现的。此标头字段可能出现在START-SESSION、SET-PARAMS或GET-PARAMS中。
num-max-verification-phrases = "Num-Max-Verification-Phrases" ":" 1*19DIGIT CRLF
num max verification phrases=“num max verification phrases”“:“1*19位CRLF
The No-Input-Timeout header field sets the length of time from the start of the verification timers (see START-INPUT-TIMERS) until the VERIFICATION-COMPLETE server event message declares that no input has been received (i.e., has a Completion-Cause of no-input-timeout). The value is in milliseconds. This header field MAY occur in VERIFY, SET-PARAMS, or GET-PARAMS. The value for this header field ranges from 0 to an implementation-specific maximum value. The default value for this header field is implementation specific.
No Input Timeout header(无输入超时)字段设置从验证计时器开始(参见start-Input-timers)到验证完成服务器事件消息声明未收到输入(即,无输入超时的完成原因)的时间长度。该值以毫秒为单位。此标头字段可能出现在VERIFY、SET-PARAMS或GET-PARAMS中。此标头字段的值范围从0到特定于实现的最大值。此标头字段的默认值是特定于实现的。
no-input-timeout = "No-Input-Timeout" ":" 1*19DIGIT CRLF
no-input-timeout = "No-Input-Timeout" ":" 1*19DIGIT CRLF
This header field allows the client to request that the verifier resource save the audio stream that was used for verification/ identification. The verifier resource MUST attempt to record the audio and make it available to the client in the form of a URI returned in the Waveform-URI header field in the VERIFICATION-COMPLETE event. If there was an error in recording the stream, or the audio content is otherwise not available, the verifier resource MUST return an empty Waveform-URI header field. The default value for this header field is "false". This header field MAY appear in the VERIFY method. Note that this header field does not appear in the VERIFY-FROM-BUFFER method since it only controls whether or not to save the waveform for live verification/identification operations.
此标头字段允许客户端请求验证器资源保存用于验证/标识的音频流。验证器资源必须尝试录制音频,并以VERIFICATION-COMPLETE事件中波形URI头字段中返回的URI的形式将其提供给客户端。如果记录流时出错,或者音频内容不可用,则验证器资源必须返回空的波形URI头字段。此标题字段的默认值为“false”。此标题字段可能出现在验证方法中。请注意,此标题字段不会出现在“从缓冲区验证”方法中,因为它仅控制是否保存波形以进行实时验证/识别操作。
save-waveform = "Save-Waveform" ":" BOOLEAN CRLF
保存波形=“保存波形”“:”布尔CRLF
This header field MAY be specified in the SET-PARAMS, GET-PARAMS, or the VERIFY methods and tells the server resource the media type of the captured audio or video such as the one captured and returned by the Waveform-URI header field.
该报头字段可以在SET-PARAMS、GET-PARAMS或VERIFY方法中指定,并告诉服务器资源捕获的音频或视频的媒体类型,例如波形URI报头字段捕获并返回的媒体类型。
media-type = "Media-Type" ":" media-type-value CRLF
媒体类型=“媒体类型”“:”媒体类型值CRLF
If the Save-Waveform header field is set to "true", the verifier resource MUST attempt to record the incoming audio stream of the verification into a file and provide a URI for the client to access it. This header field MUST be present in the VERIFICATION-COMPLETE event if the Save-Waveform header field was set to true by the client. The value of the header field MUST be empty if there was
如果Save Waveform header字段设置为“true”,则验证器资源必须尝试将验证的传入音频流记录到一个文件中,并提供一个URI供客户端访问。如果客户端将Save Waveform header字段设置为true,则此header字段必须出现在VERIFICATION-COMPLETE事件中。如果存在,则标题字段的值必须为空
some error condition preventing the server from recording. Otherwise, the URI generated by the server MUST be globally unique across the server and all its verification sessions. The content MUST be available via the URI until the verification session ends. Since the Save-Waveform header field applies only to live verification/identification operations, the server can return the Waveform-URI only in the VERIFICATION-COMPLETE event for live verification/identification operations.
阻止服务器录制的某些错误情况。否则,服务器生成的URI在整个服务器及其所有验证会话中必须是全局唯一的。在验证会话结束之前,内容必须通过URI可用。由于Save Waveform header字段仅适用于实时验证/标识操作,因此服务器只能在实时验证/标识操作的verification-COMPLETE事件中返回波形URI。
The server MUST also return the size in octets and the duration in milliseconds of the recorded audio waveform as parameters associated with the header field.
服务器还必须返回录制音频波形的大小(以八位字节为单位)和持续时间(以毫秒为单位),作为与标头字段关联的参数。
waveform-uri = "Waveform-URI" ":" ["<" uri ">" ";" "size" "=" 1*19DIGIT ";" "duration" "=" 1*19DIGIT] CRLF
waveform-uri = "Waveform-URI" ":" ["<" uri ">" ";" "size" "=" 1*19DIGIT ";" "duration" "=" 1*19DIGIT] CRLF
This header field MUST be returned in QUERY-VOICEPRINT and DELETE-VOICEPRINT responses. This is the status of the voiceprint specified in the QUERY-VOICEPRINT method. For the DELETE-VOICEPRINT method, this header field indicates the status of the voiceprint at the moment the method execution started.
必须在QUERY-VOICEPRINT和DELETE-VOICEPRINT响应中返回此标题字段。这是在QUERY-voiceprint方法中指定的声纹状态。对于DELETE-VOICEPRINT方法,此标题字段指示方法执行开始时的声纹状态。
voiceprint-exists = "Voiceprint-Exists" ":" BOOLEAN CRLF
voiceprint exists=“voiceprint exists”“:”布尔CRLF
This header field is used to indicate that this utterance could be later considered for speaker verification. This way, a client can request the server to buffer utterances while doing regular recognition or verification activities, and speaker verification can later be requested on the buffered utterances. This header field is optional in the RECOGNIZE, VERIFY, and RECORD methods. The default value for this header field is "false".
此标题字段用于指示此话语可在以后考虑用于说话人验证。通过这种方式,客户机可以在执行常规识别或验证活动时请求服务器缓冲话语,并且稍后可以对缓冲话语请求说话人验证。此标题字段在识别、验证和记录方法中是可选的。此标题字段的默认值为“false”。
ver-buffer-utterance = "Ver-Buffer-Utterance" ":" BOOLEAN CRLF
ver buffer outrance=“ver buffer outrance”“:”布尔CRLF
This header field specifies stored audio content that the client requests the server to fetch and process according to the current verification mode, either to train the voiceprint or verify a claimed identity. This header field enables the client to implement the
此标头字段指定存储的音频内容,客户端请求服务器根据当前验证模式获取和处理这些内容,以训练声纹或验证声明的身份。此标头字段使客户端能够实现
buffering use case where the recognizer and verifier resources are in different sessions and the verification buffer technique cannot be used. It MAY be specified on the VERIFY request.
缓冲用例,其中识别器和验证器资源处于不同会话中,并且无法使用验证缓冲区技术。可在验证请求中指定。
input-waveform-uri = "Input-Waveform-URI" ":" uri CRLF
输入波形uri=“输入波形uri”“:”uri CRLF
This header field MUST be part of a VERIFICATION-COMPLETE event from the verifier resource to the client. This indicates the cause of VERIFY or VERIFY-FROM-BUFFER method completion. This header field MUST be sent in the VERIFY, VERIFY-FROM-BUFFER, and QUERY-VOICEPRINT responses, if they return with a failure status and a COMPLETE state. In the ABNF below, the 'cause-code' contains a numerical value selected from the Cause-Code column of the following table. The 'cause-name' contains the corresponding token selected from the Cause-Name column.
此标头字段必须是从验证器资源到客户端的验证完成事件的一部分。这表示VERIFY或VERIFY-FROM-BUFFER方法完成的原因。如果VERIFY、VERIFY-FROM-BUFFER和QUERY-VOICEPRINT响应返回失败状态和完整状态,则必须在这些响应中发送此标题字段。在下面的ABNF中,“原因代码”包含从下表的“原因代码”列中选择的数值。“原因名称”包含从“原因名称”列中选择的相应标记。
completion-cause = "Completion-Cause" ":" cause-code SP cause-name CRLF cause-code = 3DIGIT cause-name = *VCHAR
completion-cause = "Completion-Cause" ":" cause-code SP cause-name CRLF cause-code = 3DIGIT cause-name = *VCHAR
+------------+--------------------------+---------------------------+ | Cause-Code | Cause-Name | Description | +------------+--------------------------+---------------------------+ | 000 | success | VERIFY or | | | | VERIFY-FROM-BUFFER | | | | request completed | | | | successfully. The verify | | | | decision can be | | | | "accepted", "rejected", | | | | or "undecided". | | 001 | error | VERIFY or | | | | VERIFY-FROM-BUFFER | | | | request terminated | | | | prematurely due to a | | | | verifier resource or | | | | system error. | | 002 | no-input-timeout | VERIFY request completed | | | | with no result due to a | | | | no-input-timeout. | | 003 | too-much-speech-timeout | VERIFY request completed | | | | with no result due to too | | | | much speech. | | 004 | speech-too-early | VERIFY request completed | | | | with no result due to | | | | speech too soon. |
+------------+--------------------------+---------------------------+ | Cause-Code | Cause-Name | Description | +------------+--------------------------+---------------------------+ | 000 | success | VERIFY or | | | | VERIFY-FROM-BUFFER | | | | request completed | | | | successfully. The verify | | | | decision can be | | | | "accepted", "rejected", | | | | or "undecided". | | 001 | error | VERIFY or | | | | VERIFY-FROM-BUFFER | | | | request terminated | | | | prematurely due to a | | | | verifier resource or | | | | system error. | | 002 | no-input-timeout | VERIFY request completed | | | | with no result due to a | | | | no-input-timeout. | | 003 | too-much-speech-timeout | VERIFY request completed | | | | with no result due to too | | | | much speech. | | 004 | speech-too-early | VERIFY request completed | | | | with no result due to | | | | speech too soon. |
| 005 | buffer-empty | VERIFY-FROM-BUFFER | | | | request completed with no | | | | result due to empty | | | | buffer. | | 006 | out-of-sequence | Verification operation | | | | failed due to | | | | out-of-sequence method | | | | invocations, for example, | | | | calling VERIFY before | | | | QUERY-VOICEPRINT. | | 007 | repository-uri-failure | Failure accessing | | | | Repository URI. | | 008 | repository-uri-missing | Repository-URI is not | | | | specified. | | 009 | voiceprint-id-missing | Voiceprint-Identifier is | | | | not specified. | | 010 | voiceprint-id-not-exist | Voiceprint-Identifier | | | | does not exist in the | | | | voiceprint repository. | | 011 | speech-not-usable | VERIFY request completed | | | | with no result because | | | | the speech was not usable | | | | (too noisy, too short, | | | | etc.) | +------------+--------------------------+---------------------------+
| 005 | buffer-empty | VERIFY-FROM-BUFFER | | | | request completed with no | | | | result due to empty | | | | buffer. | | 006 | out-of-sequence | Verification operation | | | | failed due to | | | | out-of-sequence method | | | | invocations, for example, | | | | calling VERIFY before | | | | QUERY-VOICEPRINT. | | 007 | repository-uri-failure | Failure accessing | | | | Repository URI. | | 008 | repository-uri-missing | Repository-URI is not | | | | specified. | | 009 | voiceprint-id-missing | Voiceprint-Identifier is | | | | not specified. | | 010 | voiceprint-id-not-exist | Voiceprint-Identifier | | | | does not exist in the | | | | voiceprint repository. | | 011 | speech-not-usable | VERIFY request completed | | | | with no result because | | | | the speech was not usable | | | | (too noisy, too short, | | | | etc.) | +------------+--------------------------+---------------------------+
This header field MAY be specified in a VERIFICATION-COMPLETE event coming from the verifier resource to the client. It contains the reason text behind the VERIFY request completion. This header field communicates text describing the reason for the failure.
可以在从验证器资源到客户端的验证完成事件中指定此标头字段。它包含验证请求完成后的原因文本。此标题字段传递描述故障原因的文本。
The completion reason text is provided for client use in logs and for debugging and instrumentation purposes. Clients MUST NOT interpret the completion reason text.
完成原因文本用于日志中的客户端使用以及调试和检测目的。客户不得解释完成原因文本。
completion-reason = "Completion-Reason" ":" quoted-string CRLF
完成原因=“完成原因”“:”带引号的字符串CRLF
This header field is the same as the one described for the Recognizer resource. See Section 9.4.15. This header field MAY occur in VERIFY, SET-PARAMS, or GET-PARAMS.
此标题字段与为识别器资源描述的标题字段相同。见第9.4.15节。此标头字段可能出现在VERIFY、SET-PARAMS或GET-PARAMS中。
This header field is the same as the one described for the Recognizer resource. See Section 9.4.23. This header field MAY be specified in a VERIFY request.
此标题字段与为识别器资源描述的标题字段相同。见第9.4.23节。此标头字段可以在验证请求中指定。
This header field MUST be sent in a STOP request to indicate whether or not to abort a VERIFY method in progress. A value of "true" requests the server to discard the results. A value of "false" requests the server to return in the STOP response the verification results obtained up to the point it received the STOP request.
必须在停止请求中发送此标头字段,以指示是否中止正在进行的验证方法。值“true”要求服务器放弃结果。值“false”要求服务器在停止响应中返回在接收到停止请求之前获得的验证结果。
abort-verification = "Abort-Verification " ":" BOOLEAN CRLF
中止验证=“中止验证”“:”布尔CRLF
This header field MAY be sent as part of a VERIFY request. A value of "false" tells the verifier resource to start the VERIFY operation but not to start the no-input timer yet. The verifier resource MUST NOT start the timers until the client sends a START-INPUT-TIMERS request to the resource. This is useful in the scenario when the verifier and synthesizer resources are not part of the same session. In this scenario, when a kill-on-barge-in prompt is being played, the client may want the VERIFY request to be simultaneously active so that it can detect and implement kill-on-barge-in (see Section 8.4.2). But at the same time, the client doesn't want the verifier resource to start the no-input timers until the prompt is finished. The default value is "true".
此标头字段可以作为验证请求的一部分发送。值“false”指示验证器资源启动验证操作,但不启动无输入计时器。在客户端向资源发送start-INPUT-timers请求之前,验证器资源不得启动计时器。这在验证器和合成器资源不属于同一会话的场景中非常有用。在这种情况下,当驳船压井提示正在播放时,客户可能希望验证请求同时激活,以便能够检测并实施驳船压井(见第8.4.2节)。但同时,客户端不希望验证器资源在提示完成之前启动无输入计时器。默认值为“true”。
start-input-timers = "Start-Input-Timers" ":" BOOLEAN CRLF
启动输入计时器=“启动输入计时器”“:”布尔CRLF
A verification response or event message can carry additional data as described in the following subsection.
验证响应或事件消息可包含以下小节中所述的附加数据。
Verification results are returned to the client in the message body of the VERIFICATION-COMPLETE event or the GET-INTERMEDIATE-RESULT response message as described in Section 6.3. Element and attribute descriptions for the verification portion of the NLSML format are provided in Section 11.5.2 with a normative definition of the schema in Section 16.3.
如第6.3节所述,验证结果将在验证-完成事件或GET-中间结果响应消息的消息体中返回给客户机。第11.5.2节提供了NLSML格式验证部分的元素和属性描述,第16.3节给出了模式的规范性定义。
All verification elements are contained within a single <verification-result> element under <result>. The elements are described below and have the schema defined in Section 16.2. The following elements are defined:
所有验证元素都包含在<result>下的单个<verification result>元素中。这些元素如下所述,并具有第16.2节中定义的模式。定义了以下元素:
1. <voiceprint>
1. <声纹>
2. <incremental>
2. <incremental>
3. <cumulative>
3. <累计>
4. <decision>
4. <decision>
5. <utterance-length>
5. <话语长度>
6. <device>
6. <device>
7. <gender>
7. <gender>
8. <adapted>
8. <adapted>
9. <verification-score>
9. <验证分数>
10. <vendor-specific-results>
10. <供应商特定结果>
This element in the verification results provides information on how the speech data matched a single voiceprint. The result data returned MAY have more than one such entity in the case of identification or multi-verification. Each <voiceprint> element and the XML data within the element describe verification result information for how well the speech data matched that particular voiceprint. The list of <voiceprint> element data are ordered according to their cumulative verification match scores, with the highest score first.
验证结果中的此元素提供有关语音数据如何匹配单个声纹的信息。在识别或多重验证的情况下,返回的结果数据可能有多个这样的实体。每个<voiceprint>元素和元素中的XML数据描述了验证结果信息,说明语音数据与特定声纹的匹配程度。<voiceprint>元素数据列表根据其累积验证匹配分数排序,首先是最高分数。
Within each <voiceprint> element there MUST be a <cumulative> element with the cumulative scores of how well multiple utterances matched the voiceprint.
在每个<voiceprint>元素中,必须有一个<cumulative>元素,其累积分数为多个语音与声纹的匹配程度。
The first <voiceprint> element MAY contain an <incremental> element with the incremental scores of how well the last utterance matched the voiceprint.
第一个<voiceprint>元素可能包含一个<incremental>元素,该元素的增量分数为最后一次说话与声纹的匹配程度。
This element is found within the <incremental> or <cumulative> element within the verification results. Its value indicates the verification decision. It can have the values of "accepted", "rejected", or "undecided".
该元素位于验证结果中的<incremental>或<cumulative>元素中。其值表示验证决定。它可以具有“接受”、“拒绝”或“未决定”的值。
This element MAY occur within either the <incremental> or <cumulative> elements within the first <voiceprint> element. Its value indicates the size in milliseconds, respectively, of the last utterance or the cumulated set of utterances.
此元素可能出现在第一个<voiceprint>元素中的<incremental>或<cumulative>元素中。其值分别表示最后一次话语或累计话语集的大小(以毫秒为单位)。
This element is found within the <incremental> or <cumulative> element within the verification results. Its value indicates the apparent type of device used by the caller as determined by the verifier resource. It can have the values of "cellular-phone", "electret-phone", "carbon-button-phone", or "unknown".
该元素位于验证结果中的<incremental>或<cumulative>元素中。其值指示由验证器资源确定的调用者使用的设备的外观类型。它的值可以是“手机”、“驻极体电话”、“碳按钮电话”或“未知”。
This element is found within the <incremental> or <cumulative> element within the verification results. Its value indicates the apparent gender of the speaker as determined by the verifier resource. It can have the values of "male", "female", or "unknown".
该元素位于验证结果中的<incremental>或<cumulative>元素中。其值表示由验证器资源确定的说话人的明显性别。它可以有“男性”、“女性”或“未知”的值。
This element is found within the first <voiceprint> element within the verification results. When verification is trying to confirm the voiceprint, this indicates if the voiceprint has been adapted as a consequence of analyzing the source utterances. It is not returned during verification training. The value can be "true" or "false".
该元素位于验证结果中的第一个<voiceprint>元素中。当验证试图确认声纹时,这表明分析源话语后是否对声纹进行了调整。在验证培训期间不返回。该值可以是“真”或“假”。
This element is found within the <incremental> or <cumulative> element within the verification results. Its value indicates the score of the last utterance as determined by verification.
该元素位于验证结果中的<incremental>或<cumulative>元素中。其值表示通过验证确定的最后一次发言的分数。
During verification, the higher the score, the more likely it is that the speaker is the same one as the one who spoke the voiceprint utterances. During training, the higher the score, the more likely the speaker is to have spoken all of the analyzed utterances. The value is a floating point between -1.0 and 1.0. If there are no such utterances, the score is -1. Note that the verification score is not a probability value.
在验证过程中,分数越高,说话人越有可能与说出声纹的人是同一个人。在训练期间,分数越高,说话者就越有可能说出所有分析过的话语。该值是介于-1.0和1.0之间的浮点。如果没有这样的话语,分数是-1。请注意,验证分数不是概率值。
MRCPv2 servers MAY send verification results that contain implementation-specific data that augment the information provided by the MRCPv2-defined elements. Such data might be useful to clients who have private knowledge of how to interpret these schema extensions. Implementation-specific additions to the verification results schema MUST belong to the vendor's own namespace. In the result structure, either they MUST be indicated by a namespace prefix declared within the result, or they MUST be children of an element identified as belonging to the respective namespace.
MRCPv2服务器可以发送验证结果,其中包含特定于实现的数据,这些数据补充了MRCPv2定义元素提供的信息。这些数据对于那些拥有如何解释这些模式扩展的私人知识的客户可能很有用。验证结果架构中特定于实现的添加必须属于供应商自己的命名空间。在结果结构中,它们必须由结果中声明的名称空间前缀表示,或者它们必须是被标识为属于相应名称空间的元素的子元素。
The following example shows the results of three voiceprints. Note that the first one has crossed the verification score threshold, and the speaker has been accepted. The voiceprint was also adapted with the most recent utterance.
下面的示例显示了三个声纹的结果。请注意,第一个已超过验证分数阈值,并且说话人已被接受。声纹也被改编成最新的话语。
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" grammar="What-Grammar-URI"> <verification-result> <voiceprint id="johnsmith"> <adapted> true </adapted> <incremental> <utterance-length> 500 </utterance-length> <device> cellular-phone </device> <gender> male </gender> <decision> accepted </decision> <verification-score> 0.98514 </verification-score> </incremental> <cumulative> <utterance-length> 10000 </utterance-length> <device> cellular-phone </device> <gender> male </gender> <decision> accepted </decision> <verification-score> 0.96725</verification-score> </cumulative> </voiceprint>
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" grammar="What-Grammar-URI"> <verification-result> <voiceprint id="johnsmith"> <adapted> true </adapted> <incremental> <utterance-length> 500 </utterance-length> <device> cellular-phone </device> <gender> male </gender> <decision> accepted </decision> <verification-score> 0.98514 </verification-score> </incremental> <cumulative> <utterance-length> 10000 </utterance-length> <device> cellular-phone </device> <gender> male </gender> <decision> accepted </decision> <verification-score> 0.96725</verification-score> </cumulative> </voiceprint>
<voiceprint id="marysmith"> <cumulative> <verification-score> 0.93410 </verification-score> </cumulative> </voiceprint> <voiceprint uri="juniorsmith"> <cumulative> <verification-score> 0.74209 </verification-score> </cumulative> </voiceprint> </verification-result> </result>
<voiceprint id="marysmith"> <cumulative> <verification-score> 0.93410 </verification-score> </cumulative> </voiceprint> <voiceprint uri="juniorsmith"> <cumulative> <verification-score> 0.74209 </verification-score> </cumulative> </voiceprint> </verification-result> </result>
Verification Results Example 1
验证结果示例1
In this next example, the verifier has enough information to decide to reject the speaker.
在下一个例子中,验证者有足够的信息来决定拒绝说话人。
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:xmpl="http://www.example.org/2003/12/mrcpv2" grammar="What-Grammar-URI"> <verification-result> <voiceprint id="johnsmith"> <incremental> <utterance-length> 500 </utterance-length> <device> cellular-phone </device> <gender> male </gender> <verification-score> 0.88514 </verification-score> <xmpl:raspiness> high </xmpl:raspiness> <xmpl:emotion> sadness </xmpl:emotion> </incremental> <cumulative> <utterance-length> 10000 </utterance-length> <device> cellular-phone </device> <gender> male </gender> <decision> rejected </decision> <verification-score> 0.9345 </verification-score> </cumulative> </voiceprint> </verification-result> </result>
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" xmlns:xmpl="http://www.example.org/2003/12/mrcpv2" grammar="What-Grammar-URI"> <verification-result> <voiceprint id="johnsmith"> <incremental> <utterance-length> 500 </utterance-length> <device> cellular-phone </device> <gender> male </gender> <verification-score> 0.88514 </verification-score> <xmpl:raspiness> high </xmpl:raspiness> <xmpl:emotion> sadness </xmpl:emotion> </incremental> <cumulative> <utterance-length> 10000 </utterance-length> <device> cellular-phone </device> <gender> male </gender> <decision> rejected </decision> <verification-score> 0.9345 </verification-score> </cumulative> </voiceprint> </verification-result> </result>
Verification Results Example 2
验证结果示例2
The START-SESSION method starts a speaker verification or speaker identification session. Execution of this method places the verifier resource into its initial state. If this method is called during an ongoing verification session, the previous session is implicitly aborted. If this method is invoked when VERIFY or VERIFY-FROM-BUFFER is active, the method fails and the server returns a status-code of 402.
START-SESSION方法启动说话人验证或说话人识别会话。执行此方法会将验证器资源置于其初始状态。如果在正在进行的验证会话期间调用此方法,则上一个会话将隐式中止。如果在VERIFY或VERIFY-FROM-BUFFER处于活动状态时调用此方法,则该方法失败,服务器返回状态代码402。
Upon completion of the START-SESSION method, the verifier resource MUST have terminated any ongoing verification session and cleared any voiceprint designation.
启动会话方法完成后,验证器资源必须已终止任何正在进行的验证会话并清除任何声纹指定。
A verification session is associated with the voiceprint repository to be used during the session. This is specified through the Repository-URI header field (see Section 11.4.1).
验证会话与会话期间要使用的声纹存储库相关联。这是通过Repository URI头字段指定的(请参阅第11.4.1节)。
The START-SESSION method also establishes, through the Voiceprint-Identifier header field, which voiceprints are to be matched or trained during the verification session. If this is an Identification session or if the client wants to do Multi-Verification, the Voiceprint-Identifier header field contains a list of semicolon-separated voiceprint identifiers.
START-SESSION方法还通过Voiceprint Identifier标头字段确定在验证会话期间要匹配或训练的声纹。如果这是一个标识会话,或者客户机希望进行多重验证,则声纹标识符标题字段包含一个分号分隔的声纹标识符列表。
The Adapt-Model header field MAY also be present in the START-SESSION request to indicate whether or not to adapt a voiceprint based on data collected during the session (if the voiceprint verification phase succeeds). By default, the voiceprint model MUST NOT be adapted with data from a verification session.
Adapt Model header字段也可能出现在启动会话请求中,以指示是否根据会话期间收集的数据调整声纹(如果声纹验证阶段成功)。默认情况下,不得使用来自验证会话的数据调整声纹模型。
The START-SESSION also determines whether the session is for a train or verify of a voiceprint. Hence, the Verification-Mode header field MUST be sent in every START-SESSION request. The value of the Verification-Mode header field MUST be one of either "train" or "verify".
启动会话还确定会话是针对列车还是验证声纹。因此,必须在每个启动会话请求中发送验证模式标头字段。验证模式标题字段的值必须为“列车”或“验证”之一。
Before a verification/identification session is started, the client may only request that VERIFY-ROLLBACK and generic SET-PARAMS and GET-PARAMS operations be performed on the verifier resource. The server MUST return status-code 402 "Method not valid in this state" for all other verification operations.
在启动验证/标识会话之前,客户端可能仅请求对验证器资源执行VERIFY-ROLLBACK和通用SET-PARAMS以及GET-PARAMS操作。对于所有其他验证操作,服务器必须返回状态代码402“方法在此状态下无效”。
A verifier resource MUST NOT have more than a single session active at one time.
验证器资源一次不能有多个活动会话。
C->S: MRCP/2.0 ... START-SESSION 314161 Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/voiceprintdbase/ Voiceprint-Mode:verify Voiceprint-Identifier:johnsmith.voiceprint Adapt-Model:true
C->S: MRCP/2.0 ... START-SESSION 314161 Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/voiceprintdbase/ Voiceprint-Mode:verify Voiceprint-Identifier:johnsmith.voiceprint Adapt-Model:true
S->C: MRCP/2.0 ... 314161 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 314161 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify
The END-SESSION method terminates an ongoing verification session and releases the verification voiceprint resources. The session may terminate in one of three ways:
结束会话方法终止正在进行的验证会话并释放验证声纹资源。会话可通过以下三种方式之一终止:
1. abort - the voiceprint adaptation or creation may be aborted so that the voiceprint remains unchanged (or is not created).
1. 中止-可以中止声纹调整或创建,使声纹保持不变(或不创建)。
2. commit - when terminating a voiceprint training session, the new voiceprint is committed to the repository.
2. 提交-终止声纹培训会话时,新声纹将提交到存储库。
3. adapt - an existing voiceprint is modified using a successful verification.
3. 自适应-使用成功的验证修改现有声纹。
The Abort-Model header field MAY be included in the END-SESSION to control whether or not to abort any pending changes to the voiceprint. The default behavior is to commit (not abort) any pending changes to the designated voiceprint.
终止会话中可能包含Abort Model header字段,以控制是否中止对声纹的任何未决更改。默认行为是提交(而不是中止)对指定声纹的任何挂起的更改。
The END-SESSION method may be safely executed multiple times without first executing the START-SESSION method. Any additional executions of this method without an intervening use of the START-SESSION method have no effect on the verifier resource.
在不首先执行启动会话方法的情况下,可以多次安全地执行结束会话方法。此方法的任何附加执行(无需中间使用START-SESSION方法)都不会对验证器资源产生影响。
The following example assumes there is either a training session or a verification session in progress.
以下示例假定正在进行培训或验证。
C->S: MRCP/2.0 ... END-SESSION 314174 Channel-Identifier:32AECB23433801@speakverify Abort-Model:true
C->S: MRCP/2.0 ... END-SESSION 314174 Channel-Identifier:32AECB23433801@speakverify Abort-Model:true
S->C: MRCP/2.0 ... 314174 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 314174 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify
The QUERY-VOICEPRINT method is used to get status information on a particular voiceprint and can be used by the client to ascertain if a voiceprint or repository exists and if it contains trained voiceprints.
QUERY-VOICEPRINT方法用于获取特定声纹的状态信息,客户端可以使用该方法确定是否存在声纹或存储库,以及是否包含经过培训的声纹。
The response to the QUERY-VOICEPRINT request contains an indication of the status of the designated voiceprint in the Voiceprint-Exists header field, allowing the client to determine whether to use the current voiceprint for verification, train a new voiceprint, or choose a different voiceprint.
对QUERY-VOICEPRINT请求的响应包含VOICEPRINT Exists标头字段中指定声纹状态的指示,允许客户端确定是否使用当前声纹进行验证、训练新声纹或选择其他声纹。
A voiceprint is completely specified by providing a repository location and a voiceprint identifier. The particular voiceprint or identity within the repository is specified by a string identifier that is unique within the repository. The Voiceprint-Identifier header field carries this unique voiceprint identifier within a given repository.
通过提供存储库位置和声纹标识符,可以完全指定声纹。存储库中的特定声纹或标识由存储库中唯一的字符串标识符指定。Voiceprint Identifier标头字段在给定存储库中携带此唯一的声纹标识符。
The following example assumes a verification session is in progress and the voiceprint exists in the voiceprint repository.
以下示例假定正在进行验证会话,并且声纹存储库中存在声纹。
C->S: MRCP/2.0 ... QUERY-VOICEPRINT 314168 Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/voiceprints/ Voiceprint-Identifier:johnsmith.voiceprint
C->S: MRCP/2.0 ... QUERY-VOICEPRINT 314168 Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/voiceprints/ Voiceprint-Identifier:johnsmith.voiceprint
S->C: MRCP/2.0 ... 314168 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/voiceprints/ Voiceprint-Identifier:johnsmith.voiceprint Voiceprint-Exists:true
S->C: MRCP/2.0 ... 314168 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/voiceprints/ Voiceprint-Identifier:johnsmith.voiceprint Voiceprint-Exists:true
The following example assumes that the URI provided in the Repository-URI header field is a bad URI.
以下示例假定存储库URI头字段中提供的URI是错误的URI。
C->S: MRCP/2.0 ... QUERY-VOICEPRINT 314168 Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/bad-uri/ Voiceprint-Identifier:johnsmith.voiceprint
C->S: MRCP/2.0 ... QUERY-VOICEPRINT 314168 Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/bad-uri/ Voiceprint-Identifier:johnsmith.voiceprint
S->C: MRCP/2.0 ... 314168 405 COMPLETE Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/bad-uri/ Voiceprint-Identifier:johnsmith.voiceprint Completion-Cause:007 repository-uri-failure
S->C: MRCP/2.0 ... 314168 405 COMPLETE Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/bad-uri/ Voiceprint-Identifier:johnsmith.voiceprint Completion-Cause:007 repository-uri-failure
The DELETE-VOICEPRINT method removes a voiceprint from a repository. This method MUST carry the Repository-URI and Voiceprint-Identifier header fields.
DELETE-VOICEPRINT方法从存储库中删除声纹。此方法必须包含存储库URI和声纹标识符标题字段。
An MRCPv2 server MUST reject a DELETE-VOICEPRINT request with a 401 status code unless the MRCPv2 client has been authenticated and authorized. Note that MRCPv2 does not have a standard mechanism for this. See Section 12.8.
MRCPv2服务器必须拒绝带有401状态代码的删除声纹请求,除非MRCPv2客户端已通过身份验证和授权。请注意,MRCPv2对此没有标准机制。见第12.8节。
If the corresponding voiceprint does not exist, the DELETE-VOICEPRINT method MUST return a 200 status code.
如果相应的声纹不存在,DELETE-voiceprint方法必须返回200状态码。
The following example demonstrates a DELETE-VOICEPRINT operation to remove a specific voiceprint.
以下示例演示删除特定声纹的DELETE-VOICEPRINT操作。
C->S: MRCP/2.0 ... DELETE-VOICEPRINT 314168 Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/bad-uri/ Voiceprint-Identifier:johnsmith.voiceprint
C->S: MRCP/2.0 ... DELETE-VOICEPRINT 314168 Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/bad-uri/ Voiceprint-Identifier:johnsmith.voiceprint
S->C: MRCP/2.0 ... 314168 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 314168 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify
The VERIFY method is used to request that the verifier resource either train/adapt the voiceprint or verify/identify a claimed identity. If the voiceprint is new or was deleted by a previous DELETE-VOICEPRINT method, the VERIFY method trains the voiceprint. If the voiceprint already exists, it is adapted and not retrained by the VERIFY command.
验证方法用于请求验证器资源训练/调整声纹或验证/识别声明的身份。如果声纹是新的或已被以前的DELETE-voiceprint方法删除,则VERIFY方法将训练声纹。如果声纹已经存在,则通过VERIFY命令对其进行调整,而不是重新训练。
C->S: MRCP/2.0 ... VERIFY 543260 Channel-Identifier:32AECB23433801@speakverify
C->S: MRCP/2.0 ... VERIFY 543260 Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 543260 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 543260 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speakverify
When the VERIFY request completes, the MRCPv2 server MUST send a VERIFICATION-COMPLETE event to the client.
当验证请求完成时,MRCPv2服务器必须向客户端发送验证完成事件。
The VERIFY-FROM-BUFFER method directs the verifier resource to verify buffered audio against a voiceprint. Only one VERIFY or VERIFY-FROM-BUFFER method may be active for a verifier resource at a time.
VERIFY-FROM-BUFFER方法指示验证器资源根据声纹验证缓冲音频。对于验证器资源,一次只能有一个VERIFY或VERIFY-FROM-BUFFER方法处于活动状态。
The buffered audio is not consumed by this method and thus VERIFY-FROM-BUFFER may be invoked multiple times by the client to attempt verification against different voiceprints.
此方法不会使用缓冲音频,因此客户端可能会多次调用VERIFY-FROM-BUFFER来尝试对不同的声纹进行验证。
For the VERIFY-FROM-BUFFER method, the server MAY optionally return an IN-PROGRESS response before the VERIFICATION-COMPLETE event.
对于VERIFY-FROM-BUFFER方法,服务器可以选择在VERIFY-COMPLETE事件之前返回一个进行中响应。
When the VERIFY-FROM-BUFFER method is invoked and the verification buffer is in use by another resource sharing it, the server MUST return an IN-PROGRESS response and wait until the buffer is available to it. The verification buffer is owned by the verifier resource but is shared with write access from other input resources on the same session. Hence, it is considered to be in use if there is a read or write operation such as a RECORD or RECOGNIZE with the Ver-Buffer-Utterance header field set to "true" on a resource that shares this buffer. Note that if a RECORD or RECOGNIZE method returns with a failure cause code, the VERIFY-FROM-BUFFER request waiting to process that buffer MUST also fail with a Completion-Cause of 005 (buffer-empty).
当调用了VERIFY-FROM-BUFFER方法并且验证缓冲区正被共享它的另一个资源使用时,服务器必须返回一个进行中的响应,并等待缓冲区对其可用。验证缓冲区由验证程序资源拥有,但与同一会话上其他输入资源的写访问共享。因此,如果在共享此缓冲区的资源上存在读取或写入操作(例如记录或识别),且Ver Buffer OUTPUTE header字段设置为“true”,则认为正在使用该缓冲区。注意,如果记录或识别方法返回失败原因代码,则等待处理该缓冲区的VERIFY-FROM-BUFFER请求也必须失败,完成原因为005(BUFFER empty)。
The following example illustrates the usage of some buffering methods. In this scenario, the client first performed a live verification, but the utterance had been rejected. In the meantime, the utterance is also saved to the audio buffer. Then, another voiceprint is used to do verification against the audio buffer and the utterance is accepted. For the example, we assume both Num-Min-Verification-Phrases and Num-Max-Verification-Phrases are 1.
下面的示例说明了一些缓冲方法的用法。在这种情况下,客户机首先执行实时验证,但该发言被拒绝。同时,声音也被保存到音频缓冲区。然后,使用另一个声纹对音频缓冲区进行验证,并接受该语音。例如,我们假设Num-Min验证短语和Num-Max验证短语都是1。
C->S: MRCP/2.0 ... START-SESSION 314161 Channel-Identifier:32AECB23433801@speakverify Verification-Mode:verify Adapt-Model:true Repository-URI:http://www.example.com/voiceprints Voiceprint-Identifier:johnsmith.voiceprint
C->S: MRCP/2.0 ... START-SESSION 314161 Channel-Identifier:32AECB23433801@speakverify Verification-Mode:verify Adapt-Model:true Repository-URI:http://www.example.com/voiceprints Voiceprint-Identifier:johnsmith.voiceprint
S->C: MRCP/2.0 ... 314161 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 314161 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify
C->S: MRCP/2.0 ... VERIFY 314162 Channel-Identifier:32AECB23433801@speakverify Ver-buffer-utterance:true
C->S: MRCP/2.0 ... VERIFY 314162 Channel-Identifier:32AECB23433801@speakverify Ver-buffer-utterance:true
S->C: MRCP/2.0 ... 314162 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 314162 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... VERIFICATION-COMPLETE 314162 COMPLETE Channel-Identifier:32AECB23433801@speakverify Completion-Cause:000 success Content-Type:application/nlsml+xml Content-Length:...
S->C:MRCP/2.0。。。验证-完整314162完整通道标识符:32AECB23433801@speakverify完成原因:000成功内容类型:应用程序/nlsml+xml内容长度:。。。
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" grammar="What-Grammar-URI"> <verification-result> <voiceprint id="johnsmith"> <incremental> <utterance-length> 500 </utterance-length> <device> cellular-phone </device> <gender> female </gender> <decision> rejected </decision> <verification-score> 0.05465 </verification-score> </incremental> <cumulative> <utterance-length> 500 </utterance-length> <device> cellular-phone </device> <gender> female </gender> <decision> rejected </decision> <verification-score> 0.05465 </verification-score> </cumulative> </voiceprint> </verification-result> </result>
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" grammar="What-Grammar-URI"> <verification-result> <voiceprint id="johnsmith"> <incremental> <utterance-length> 500 </utterance-length> <device> cellular-phone </device> <gender> female </gender> <decision> rejected </decision> <verification-score> 0.05465 </verification-score> </incremental> <cumulative> <utterance-length> 500 </utterance-length> <device> cellular-phone </device> <gender> female </gender> <decision> rejected </decision> <verification-score> 0.05465 </verification-score> </cumulative> </voiceprint> </verification-result> </result>
C->S: MRCP/2.0 ... QUERY-VOICEPRINT 314163 Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/voiceprints/ Voiceprint-Identifier:johnsmith
C->S: MRCP/2.0 ... QUERY-VOICEPRINT 314163 Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/voiceprints/ Voiceprint-Identifier:johnsmith
S->C: MRCP/2.0 ... 314163 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/voiceprints/ Voiceprint-Identifier:johnsmith.voiceprint Voiceprint-Exists:true
S->C: MRCP/2.0 ... 314163 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify Repository-URI:http://www.example.com/voiceprints/ Voiceprint-Identifier:johnsmith.voiceprint Voiceprint-Exists:true
C->S: MRCP/2.0 ... START-SESSION 314164 Channel-Identifier:32AECB23433801@speakverify Verification-Mode:verify Adapt-Model:true Repository-URI:http://www.example.com/voiceprints Voiceprint-Identifier:marysmith.voiceprint
C->S: MRCP/2.0 ... START-SESSION 314164 Channel-Identifier:32AECB23433801@speakverify Verification-Mode:verify Adapt-Model:true Repository-URI:http://www.example.com/voiceprints Voiceprint-Identifier:marysmith.voiceprint
S->C: MRCP/2.0 ... 314164 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 314164 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify
C->S: MRCP/2.0 ... VERIFY-FROM-BUFFER 314165 Channel-Identifier:32AECB23433801@speakverify
C->S: MRCP/2.0 ... VERIFY-FROM-BUFFER 314165 Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 314165 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 314165 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... VERIFICATION-COMPLETE 314165 COMPLETE Channel-Identifier:32AECB23433801@speakverify Completion-Cause:000 success Content-Type:application/nlsml+xml Content-Length:...
S->C:MRCP/2.0。。。验证-完整314165完整通道标识符:32AECB23433801@speakverify完成原因:000成功内容类型:应用程序/nlsml+xml内容长度:。。。
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" grammar="What-Grammar-URI"> <verification-result> <voiceprint id="marysmith"> <incremental> <utterance-length> 1000 </utterance-length> <device> cellular-phone </device> <gender> female </gender> <decision> accepted </decision> <verification-score> 0.98 </verification-score> </incremental> <cumulative> <utterance-length> 1000 </utterance-length> <device> cellular-phone </device> <gender> female </gender> <decision> accepted </decision> <verification-score> 0.98 </verification-score> </cumulative> </voiceprint> </verification-result> </result>
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" grammar="What-Grammar-URI"> <verification-result> <voiceprint id="marysmith"> <incremental> <utterance-length> 1000 </utterance-length> <device> cellular-phone </device> <gender> female </gender> <decision> accepted </decision> <verification-score> 0.98 </verification-score> </incremental> <cumulative> <utterance-length> 1000 </utterance-length> <device> cellular-phone </device> <gender> female </gender> <decision> accepted </decision> <verification-score> 0.98 </verification-score> </cumulative> </voiceprint> </verification-result> </result>
C->S: MRCP/2.0 ... END-SESSION 314166 Channel-Identifier:32AECB23433801@speakverify
C->S: MRCP/2.0 ... END-SESSION 314166 Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 314166 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 314166 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify
VERIFY-FROM-BUFFER Example
从缓冲区验证示例
The VERIFY-ROLLBACK method discards the last buffered utterance or discards the last live utterances (when the mode is "train" or "verify"). The client will likely want to invoke this method when the user provides undesirable input such as non-speech noises, side-speech, out-of-grammar utterances, commands, etc. Note that this method does not provide a stack of rollback states. Executing VERIFY-ROLLBACK twice in succession without an intervening recognition operation has no effect on the second attempt.
VERIFY-ROLLBACK方法丢弃最后一个缓冲语句或丢弃最后一个活动语句(当模式为“train”或“VERIFY”时)。当用户提供不需要的输入(如非语音噪音、旁白、语法错误的话语、命令等)时,客户端可能希望调用此方法。请注意,此方法不提供回滚状态堆栈。连续执行两次VERIFY-ROLLBACK而不进行干预识别操作对第二次尝试没有影响。
C->S: MRCP/2.0 ... VERIFY-ROLLBACK 314165 Channel-Identifier:32AECB23433801@speakverify
C->S: MRCP/2.0 ... VERIFY-ROLLBACK 314165 Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 314165 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 314165 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify
VERIFY-ROLLBACK Example
验证-回滚示例
The STOP method from the client to the server tells the verifier resource to stop the VERIFY or VERIFY-FROM-BUFFER request if one is active. If such a request is active and the STOP request successfully terminated it, then the response header section contains an Active-Request-Id-List header field containing the request-id of the VERIFY or VERIFY-FROM-BUFFER request that was terminated. In this case, no VERIFICATION-COMPLETE event is sent for the terminated request. If there was no verify request active, then the response MUST NOT contain an Active-Request-Id-List header field. Either way, the response MUST contain a status-code of 200 "Success".
从客户端到服务器的STOP方法告诉验证器资源停止VERIFY或VERIFY-from-BUFFER请求(如果一个请求处于活动状态)。如果此类请求处于活动状态且停止请求成功终止,则响应标头部分包含活动请求Id列表标头字段,其中包含已终止的验证或来自缓冲区的验证请求的请求Id。在这种情况下,不会为终止的请求发送验证完成事件。如果没有激活的验证请求,则响应不得包含激活的请求Id列表标题字段。无论哪种方式,响应必须包含200“成功”的状态代码。
The STOP method can carry an Abort-Verification header field, which specifies if the verification result until that point should be discarded or returned. If this header field is not present or if the value is "true", the verification result is discarded and the STOP response does not contain any result data. If the header field is present and its value is "false", the STOP response MUST contain a Completion-Cause header field and carry the Verification result data in its body.
STOP方法可以携带一个Abort Verification header字段,该字段指定在该点之前的验证结果是否应被丢弃或返回。如果此标题字段不存在或值为“true”,则验证结果将被丢弃,停止响应不包含任何结果数据。如果标题字段存在且其值为“false”,则停止响应必须包含完成原因标题字段,并在其正文中携带验证结果数据。
An aborted VERIFY request does an automatic rollback and hence does not affect the cumulative score. A VERIFY request that was stopped with no Abort-Verification header field or with the Abort-Verification header field set to "false" does affect cumulative scores and would need to be explicitly rolled back if the client does not want the verification result considered in the cumulative scores.
中止的验证请求会自动回滚,因此不会影响累积分数。在没有中止验证标头字段或中止验证标头字段设置为“false”的情况下停止的验证请求会影响累积分数,如果客户端不希望在累积分数中考虑验证结果,则需要显式回滚。
The following example assumes a voiceprint identity has already been established.
以下示例假定已建立声纹标识。
C->S: MRCP/2.0 ... VERIFY 314177 Channel-Identifier:32AECB23433801@speakverify
C->S: MRCP/2.0 ... VERIFY 314177 Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 314177 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 314177 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speakverify
C->S: MRCP/2.0 ... STOP 314178 Channel-Identifier:32AECB23433801@speakverify
C->S: MRCP/2.0 ... STOP 314178 Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 314178 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify Active-Request-Id-List:314177
S->C: MRCP/2.0 ... 314178 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify Active-Request-Id-List:314177
STOP Verification Example
停止验证示例
This request is sent from the client to the verifier resource to start the no-input timer, usually once the client has ascertained that any audio prompts to the user have played to completion.
此请求从客户端发送到验证器资源,以启动无输入计时器,通常在客户端确定向用户发出的任何音频提示已播放到结束时进行。
C->S: MRCP/2.0 ... START-INPUT-TIMERS 543260 Channel-Identifier:32AECB23433801@speakverify
C->S: MRCP/2.0 ... START-INPUT-TIMERS 543260 Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 543260 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 543260 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify
The VERIFICATION-COMPLETE event follows a call to VERIFY or VERIFY-FROM-BUFFER and is used to communicate the verification results to the client. The event message body contains only verification results.
VERIFICATION-COMPLETE事件在调用VERIFY或VERIFY-FROM-BUFFER之后发生,用于将验证结果传递给客户端。事件消息正文仅包含验证结果。
S->C: MRCP/2.0 ... VERIFICATION-COMPLETE 543259 COMPLETE Completion-Cause:000 success Content-Type:application/nlsml+xml Content-Length:...
S->C:MRCP/2.0。。。验证-完成543259完成完成原因:000成功内容类型:应用程序/nlsml+xml内容长度:。。。
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" grammar="What-Grammar-URI"> <verification-result> <voiceprint id="johnsmith">
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" grammar="What-Grammar-URI"> <verification-result> <voiceprint id="johnsmith">
<incremental> <utterance-length> 500 </utterance-length> <device> cellular-phone </device> <gender> male </gender> <decision> accepted </decision> <verification-score> 0.85 </verification-score> </incremental> <cumulative> <utterance-length> 1500 </utterance-length> <device> cellular-phone </device> <gender> male </gender> <decision> accepted </decision> <verification-score> 0.75 </verification-score> </cumulative> </voiceprint> </verification-result> </result>
<incremental> <utterance-length> 500 </utterance-length> <device> cellular-phone </device> <gender> male </gender> <decision> accepted </decision> <verification-score> 0.85 </verification-score> </incremental> <cumulative> <utterance-length> 1500 </utterance-length> <device> cellular-phone </device> <gender> male </gender> <decision> accepted </decision> <verification-score> 0.75 </verification-score> </cumulative> </voiceprint> </verification-result> </result>
The START-OF-INPUT event is returned from the server to the client once the server has detected speech. This event is always returned by the verifier resource when speech has been detected, irrespective of whether or not the recognizer and verifier resources share the same session.
一旦服务器检测到语音,输入开始事件将从服务器返回到客户端。当检测到语音时,无论识别器和验证器资源是否共享同一会话,验证器资源始终返回此事件。
S->C: MRCP/2.0 ... START-OF-INPUT 543259 IN-PROGRESS Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... START-OF-INPUT 543259 IN-PROGRESS Channel-Identifier:32AECB23433801@speakverify
The CLEAR-BUFFER method can be used to clear the verification buffer. This buffer is used to buffer speech during recognition, record, or verification operations that may later be used by VERIFY-FROM-BUFFER. As noted before, the buffer associated with the verifier resource is shared by other input resources like recognizers and recorders. Hence, a CLEAR-BUFFER request fails if the verification buffer is in use. This can happen when any one of the input resources that share this buffer has an active read or write operation such as RECORD, RECOGNIZE, or VERIFY with the Ver-Buffer-Utterance header field set to "true".
清除缓冲区方法可用于清除验证缓冲区。此缓冲区用于在识别、记录或验证操作期间缓冲语音,这些操作稍后可能由VERIFY-FROM-buffer使用。如前所述,与验证器资源相关联的缓冲区由其他输入资源(如识别器和记录器)共享。因此,如果正在使用验证缓冲区,则清除缓冲区请求将失败。当共享此缓冲区的任何一个输入资源具有活动的读或写操作(如记录、识别或验证)且Ver buffer OUTPUTE header字段设置为“true”时,可能会发生这种情况。
C->S: MRCP/2.0 ... CLEAR-BUFFER 543260 Channel-Identifier:32AECB23433801@speakverify
C->S: MRCP/2.0 ... CLEAR-BUFFER 543260 Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 543260 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 543260 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify
A client can use the GET-INTERMEDIATE-RESULT method to poll for intermediate results of a verification request that is in progress. Invoking this method does not change the state of the resource. The verifier resource collects the accumulated verification results and returns the information in the method response. The message body in the response to a GET-INTERMEDIATE-RESULT REQUEST contains only verification results. The method response MUST NOT contain a Completion-Cause header field as the request is not yet complete. If the resource does not have a verification in progress, the response has a 402 failure status-code and no result in the body.
客户端可以使用GET-INTERMEDIATE-RESULT方法轮询正在进行的验证请求的中间结果。调用此方法不会更改资源的状态。验证器资源收集累积的验证结果,并在方法响应中返回信息。GET-INTERMEDIATE-RESULT请求响应中的消息体仅包含验证结果。方法响应不得包含完成原因标头字段,因为请求尚未完成。如果资源没有正在进行的验证,则响应具有402故障状态代码且正文中没有结果。
C->S: MRCP/2.0 ... GET-INTERMEDIATE-RESULT 543260 Channel-Identifier:32AECB23433801@speakverify
C->S: MRCP/2.0 ... GET-INTERMEDIATE-RESULT 543260 Channel-Identifier:32AECB23433801@speakverify
S->C: MRCP/2.0 ... 543260 200 COMPLETE Channel-Identifier:32AECB23433801@speakverify Content-Type:application/nlsml+xml Content-Length:...
S->C:MRCP/2.0。。。543260 200完整通道标识符:32AECB23433801@speakverify内容类型:应用程序/nlsml+xml内容长度:。。。
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" grammar="What-Grammar-URI"> <verification-result> <voiceprint id="marysmith"> <incremental> <utterance-length> 50 </utterance-length> <device> cellular-phone </device> <gender> female </gender> <decision> undecided </decision> <verification-score> 0.85 </verification-score> </incremental> <cumulative> <utterance-length> 150 </utterance-length> <device> cellular-phone </device> <gender> female </gender> <decision> undecided </decision> <verification-score> 0.65 </verification-score> </cumulative> </voiceprint> </verification-result> </result>
<?xml version="1.0"?> <result xmlns="urn:ietf:params:xml:ns:mrcpv2" grammar="What-Grammar-URI"> <verification-result> <voiceprint id="marysmith"> <incremental> <utterance-length> 50 </utterance-length> <device> cellular-phone </device> <gender> female </gender> <decision> undecided </decision> <verification-score> 0.85 </verification-score> </incremental> <cumulative> <utterance-length> 150 </utterance-length> <device> cellular-phone </device> <gender> female </gender> <decision> undecided </decision> <verification-score> 0.65 </verification-score> </cumulative> </voiceprint> </verification-result> </result>
MRCPv2 is designed to comply with the security-related requirements documented in the SPEECHSC requirements [RFC4313]. Implementers and users of MRCPv2 are strongly encouraged to read the Security Considerations section of [RFC4313], because that document contains discussion of a number of important security issues associated with the utilization of speech as biometric authentication technology, and on the threats against systems which store recorded speech, contain large corpora of voiceprints, and send and receive sensitive information based on voice input to a recognizer or speech output from a synthesizer. Specific security measures employed by MRCPv2 are summarized in the following subsections. See the corresponding sections of this specification for how the security-related machinery is invoked by individual protocol operations.
MRCPv2的设计符合SPEECHSC要求[RFC4313]中记录的安全相关要求。强烈鼓励MRCPv2的实施者和用户阅读[RFC4313]的安全注意事项部分,因为该文件包含与语音作为生物认证技术的使用相关的许多重要安全问题的讨论,以及对存储录制语音的系统的威胁,包含声纹的大型语料库,并根据识别器的语音输入或合成器的语音输出发送和接收敏感信息。MRCPv2采用的具体安全措施总结在以下小节中。有关各个协议操作如何调用安全相关机制,请参见本规范的相应章节。
MRCPv2 control sessions are established as media sessions described by SDP within the context of a SIP dialog. In order to ensure secure rendezvous between MRCPv2 clients and servers, the following are required:
MRCPv2控制会话建立为SDP在SIP对话上下文中描述的媒体会话。为了确保MRCPv2客户端和服务器之间的安全会合,需要执行以下操作:
1. The SIP implementation in MRCPv2 clients and servers MUST support SIP digest authentication [RFC3261] and SHOULD employ it.
1. MRCPv2客户端和服务器中的SIP实现必须支持SIP摘要身份验证[RFC3261],并且应该使用它。
2. The SIP implementation in MRCPv2 clients and servers MUST support 'sips' URIs and SHOULD employ 'sips' URIs; this includes that clients and servers SHOULD set up TLS [RFC5246] connections.
2. MRCPv2客户端和服务器中的SIP实现必须支持“sips”URI,并应采用“sips”URI;这包括客户端和服务器应设置TLS[RFC5246]连接。
3. If media stream cryptographic keying is done through SDP (e.g. using [RFC4568]), the MRCPv2 clients and servers MUST employ the 'sips' URI.
3. 如果通过SDP(例如使用[RFC4568])完成媒体流加密键控,则MRCPv2客户端和服务器必须使用“sips”URI。
4. When TLS is used for SIP, the client MUST verify the identity of the server to which it connects, following the rules and guidelines defined in [RFC5922].
4. 当TLS用于SIP时,客户机必须按照[RFC5922]中定义的规则和准则验证其连接的服务器的身份。
Sensitive data is carried over the MRCPv2 control channel. This includes things like the output of speech recognition operations, speaker verification results, input to text-to-speech conversion, personally identifying grammars, etc. For this reason, MRCPv2 servers must be properly authenticated, and the control channel must permit the use of both confidentiality and integrity for the data. To ensure control channel protection, MRCPv2 clients and servers MUST support TLS and SHOULD utilize it by default unless alternative
敏感数据通过MRCPv2控制通道传输。这包括语音识别操作的输出、说话人验证结果、文本到语音转换的输入、个人识别语法等。因此,必须对MRCPv2服务器进行适当的身份验证,并且控制通道必须允许使用数据的机密性和完整性。为确保控制通道保护,MRCPv2客户端和服务器必须支持TLS,并且在默认情况下应使用TLS,除非另有选择
control channel protection is used. When TLS is used, the client MUST verify the identity of the server to which it connects, following the rules and guidelines defined in [RFC4572]. If there are multiple TLS-protected channels between the client and the server, the server MUST NOT send a response to the client over a channel for which the TLS identities of the server or client differ from the channel over which the server received the corresponding request. Alternative control-channel protection MAY be used if desired (e.g., Security Architecture for the Internet Protocol (IPsec) [RFC4301]).
使用控制通道保护。使用TLS时,客户机必须按照[RFC4572]中定义的规则和指南验证其所连接服务器的身份。如果客户端和服务器之间存在多个受TLS保护的通道,则服务器不得通过服务器或客户端的TLS标识与服务器接收相应请求的通道不同的通道向客户端发送响应。如果需要,可以使用替代控制信道保护(例如,因特网协议(IPsec)[RFC4301]的安全体系结构)。
Sensitive data is also carried on media sessions terminating on MRCPv2 servers (the other end of a media channel may or may not be on the MRCPv2 client). This data includes the user's spoken utterances and the output of text-to-speech operations. MRCPv2 servers MUST support a security mechanism for protection of audio media sessions. MRCPv2 clients that originate or consume audio similarly MUST support a security mechanism for protection of the audio. One such mechanism is the Secure Real-time Transport Protocol (SRTP) [RFC3711].
敏感数据也会在终止于MRCPv2服务器的媒体会话中传输(媒体通道的另一端可能在MRCPv2客户端上,也可能不在MRCPv2客户端上)。这些数据包括用户的口语和文本到语音操作的输出。MRCPv2服务器必须支持保护音频媒体会话的安全机制。以类似方式创建或使用音频的MRCPv2客户端必须支持音频保护的安全机制。其中一种机制是安全实时传输协议(SRTP)[RFC3711]。
MCRPv2 employs content indirection extensively. Content may be fetched and/or stored based on URI addressing on systems other than the MRCPv2 client or server. Not all of the stored content is necessarily sensitive (e.g., XML schemas), but the majority generally needs protection, and some indirect content, such as voice recordings and voiceprints, is extremely sensitive and must always be protected. MRCPv2 clients and servers MUST implement HTTPS for indirect content access and SHOULD employ secure access for all sensitive indirect content. Other secure URI schemes such as Secure FTP (FTPS) [RFC4217] MAY also be used. See Section 6.2.15 for the header fields used to transfer cookie information between the MRCPv2 client and server if needed for authentication.
MCRPv2广泛使用内容间接寻址。可以基于除MRCPv2客户端或服务器之外的系统上的URI寻址来获取和/或存储内容。并非所有存储的内容都一定是敏感的(例如XML模式),但大多数内容通常需要保护,而一些间接内容(例如录音和声纹)非常敏感,必须始终加以保护。MRCPv2客户端和服务器必须为间接内容访问实现HTTPS,并应为所有敏感间接内容采用安全访问。还可以使用其他安全URI方案,例如安全FTP(FTPS)[RFC4217]。如需验证,请参阅第6.2.15节,了解用于在MRCPv2客户端和服务器之间传输cookie信息的标题字段。
Access to URIs provided by servers introduces risks that need to be considered. Although RFC 6454 [RFC6454] discusses and focuses on a same-origin policy, which MRCPv2 does not restrict URIs to, it still provides an excellent description of the pitfalls of blindly following server-provided URIs in Section 3 of the RFC. Servers also need to be aware that clients could provide URIs to sites designed to tie up the server in long or otherwise problematic document fetches. MRCPv2 servers, and the services they access, MUST always be prepared for the possibility of such a denial-of-service attack.
访问服务器提供的URI会带来需要考虑的风险。尽管RFC 6454[RFC6454]讨论并关注同一来源策略(MRCPv2不限制URI),但它仍然在RFC第3节中对盲目遵循服务器提供的URI的陷阱进行了出色的描述。服务器还需要意识到,客户端可能会向站点提供URI,这些站点设计用于在长时间或其他有问题的文档获取中绑定服务器。MRCPv2服务器及其访问的服务必须始终做好准备,以应对此类拒绝服务攻击的可能性。
MRCPv2 makes no inherent assumptions about the lifetime and access controls associated with a URI. For example, if neither authentication nor scheme-specific access controls are used, a leak of the URI is equivalent to a leak of the content. Moreover, MRCPv2 makes no specific demands on the lifetime of a URI. If a server offers a URI and the client takes a long, long time to access that URI, the server may have removed the resource in the interim time period. MRCPv2 deals with this case by using the URI access scheme's 'resource not found' error, such as 404 for HTTPS. How long a server should keep a dynamic resource available is highly application and context dependent. However, the server SHOULD keep the resource available for a reasonable amount of time to make it likely the client will have the resource available when the client needs the resource. Conversely, to mitigate state exhaustion attacks, MRCPv2 servers are not obligated to keep resources and resource state in perpetuity. The server SHOULD delete dynamically generated resources associated with an MRCPv2 session when the session ends.
MRCPv2没有对与URI相关联的生存期和访问控制进行固有的假设。例如,若既不使用身份验证也不使用特定于方案的访问控制,那个么URI泄漏就相当于内容泄漏。此外,MRCPv2对URI的生存期没有具体要求。如果服务器提供了一个URI,而客户端需要很长很长时间才能访问该URI,则服务器可能已经在过渡时间段内删除了该资源。MRCPv2通过使用URI访问方案的“未找到资源”错误(如404 for HTTPS)来处理这种情况。服务器应使动态资源保持多长时间可用在很大程度上取决于应用程序和上下文。但是,服务器应在合理的时间内保持资源可用,以便在客户机需要资源时,客户机可能拥有可用的资源。相反,为了减轻状态耗尽攻击,MRCPv2服务器没有义务永久保留资源和资源状态。当会话结束时,服务器应删除与MRCPv2会话关联的动态生成的资源。
One method to avoid resource leakage is for the server to use difficult-to-guess, one-time resource URIs. In this instance, there can be only a single access to the underlying resource using the given URI. A downside to this approach is if an attacker uses the URI before the client uses the URI, then the client is denied the resource. Other methods would be to adopt a mechanism similar to the URLAUTH IMAP extension [RFC4467], where the server sets cryptographic checks on URI usage, as well as capabilities for expiration, revocation, and so on. Specifying such a mechanism is beyond the scope of this document.
一种避免资源泄漏的方法是让服务器使用难以猜测的一次性资源URI。在本例中,使用给定URI只能对底层资源进行一次访问。这种方法的一个缺点是,如果攻击者在客户端使用URI之前使用URI,那么客户端将被拒绝获得资源。其他方法是采用类似于URLAUTH IMAP扩展[RFC4467]的机制,其中服务器设置URI使用情况的加密检查,以及过期、撤销等功能。指定这种机制超出了本文件的范围。
MRCPv2 applications often require the use of stored media. Voice recordings are both stored (e.g., for diagnosis and system tuning), and fetched (for replaying utterances into multiple MRCPv2 resources). Voiceprints are fundamental to the speaker identification and verification functions. This data can be extremely sensitive and can present substantial privacy and impersonation risks if stolen. Systems employing MRCPv2 SHOULD be deployed in ways that minimize these risks. The SPEECHSC requirements RFC [RFC4313] contains a more extensive discussion of these risks and ways they may be mitigated.
MRCPv2应用程序通常需要使用存储介质。语音记录既可以存储(例如,用于诊断和系统调谐),也可以获取(用于将语音回放到多个MRCPv2资源中)。声纹是说话人识别和验证功能的基础。这些数据可能非常敏感,如果被盗,可能会带来巨大的隐私和假冒风险。采用MRCPv2的系统应以最小化这些风险的方式部署。演讲要求RFC[RFC4313]对这些风险及其缓解方式进行了更广泛的讨论。
DTMF buffers and recognition buffers may grow large enough to exceed the capabilities of a server, and the server MUST be prepared to gracefully handle resource consumption. A server MAY respond with the appropriate recognition incomplete if the server is in danger of running out of resources.
DTMF缓冲区和识别缓冲区可能会变得足够大,超过服务器的能力,服务器必须准备好优雅地处理资源消耗。如果服务器有资源耗尽的危险,则服务器可能会以适当的未完成识别响应。
In MRCPv2, there are some tasks, such as URI resource fetches, that the server does on behalf of the client. To control this behavior, MRCPv2 has a number of server parameters that a client can configure. With one such parameter, Fetch-Timeout (Section 6.2.12), a malicious client could set a very large value and then request the server to fetch a non-existent document. It is RECOMMENDED that servers be cautious about accepting long timeout values or abnormally large values for other client-set parameters.
在MRCPv2中,服务器代表客户机执行一些任务,例如URI资源获取。为了控制这种行为,MRCPv2有许多客户端可以配置的服务器参数。有了这样一个参数,Fetch Timeout(第6.2.12节),恶意客户端可以设置一个非常大的值,然后请求服务器获取一个不存在的文档。建议服务器在接受长超时值或其他客户端设置参数的异常大值时要谨慎。
Since this specification does not mandate a specific mechanism for authentication and authorization when requesting DELETE-VOICEPRINT (Section 11.9), there is a risk that an MRCPv2 server may not do such a check for authentication and authorization. In practice, each provider of voice biometric solutions does insist on its own authentication and authorization mechanism, outside of this specification, so this is not likely to be a major problem. If in the future voice biometric providers standardize on such a mechanism, then a future version of MRCP can mandate it.
由于本规范在请求DELETE-VOICEPRINT(第11.9节)时未强制要求特定的身份验证和授权机制,因此存在MRCPv2服务器可能无法进行身份验证和授权检查的风险。实际上,语音生物识别解决方案的每个提供商都坚持自己的身份验证和授权机制,不在本规范范围内,因此这不太可能是一个大问题。如果将来语音生物特征识别提供商在这种机制上实现标准化,那么未来版本的MRCP可以强制执行。
This section describes the name spaces (registries) for MRCPv2 that IANA has created and now maintains. Assignment/registration policies are described in RFC 5226 [RFC5226].
本节介绍IANA已创建并现在维护的MRCPv2的名称空间(注册表)。RFC 5226[RFC5226]中描述了分配/注册策略。
IANA has created a new name space of "MRCPv2 Resource Types". All maintenance within and additions to the contents of this name space MUST be according to the "Standards Action" registration policy. The initial contents of the registry, defined in Section 4.2, are given below:
IANA已经创建了一个新的名称空间“MRCPv2资源类型”。此名称空间内容的所有维护和添加必须符合“标准行动”注册政策。第4.2节中定义的登记册初始内容如下:
Resource type Resource description Reference ------------- -------------------- --------- speechrecog Speech Recognizer [RFC6787] dtmfrecog DTMF Recognizer [RFC6787] speechsynth Speech Synthesizer [RFC6787] basicsynth Basic Synthesizer [RFC6787] speakverify Speaker Verifier [RFC6787] recorder Speech Recorder [RFC6787]
Resource type Resource description Reference ------------- -------------------- --------- speechrecog Speech Recognizer [RFC6787] dtmfrecog DTMF Recognizer [RFC6787] speechsynth Speech Synthesizer [RFC6787] basicsynth Basic Synthesizer [RFC6787] speakverify Speaker Verifier [RFC6787] recorder Speech Recorder [RFC6787]
IANA has created a new name space of "MRCPv2 Methods and Events". All maintenance within and additions to the contents of this name space MUST be according to the "Standards Action" registration policy. The initial contents of the registry, defined by the "method-name" and "event-name" BNF in Section 15 and explained in Sections 5.2 and 5.5, are given below.
IANA已经创建了一个新的名称空间“MRCPv2方法和事件”。此名称空间内容的所有维护和添加必须符合“标准行动”注册政策。注册表的初始内容由第15节中的“方法名称”和“事件名称”BNF定义,并在第5.2节和第5.5节中解释,如下所示。
Name Resource type Method/Event Reference ---- ------------- ------------ --------- SET-PARAMS Generic Method [RFC6787] GET-PARAMS Generic Method [RFC6787] SPEAK Synthesizer Method [RFC6787] STOP Synthesizer Method [RFC6787] PAUSE Synthesizer Method [RFC6787] RESUME Synthesizer Method [RFC6787] BARGE-IN-OCCURRED Synthesizer Method [RFC6787] CONTROL Synthesizer Method [RFC6787] DEFINE-LEXICON Synthesizer Method [RFC6787] DEFINE-GRAMMAR Recognizer Method [RFC6787] RECOGNIZE Recognizer Method [RFC6787] INTERPRET Recognizer Method [RFC6787] GET-RESULT Recognizer Method [RFC6787] START-INPUT-TIMERS Recognizer Method [RFC6787] STOP Recognizer Method [RFC6787] START-PHRASE-ENROLLMENT Recognizer Method [RFC6787] ENROLLMENT-ROLLBACK Recognizer Method [RFC6787] END-PHRASE-ENROLLMENT Recognizer Method [RFC6787] MODIFY-PHRASE Recognizer Method [RFC6787] DELETE-PHRASE Recognizer Method [RFC6787] RECORD Recorder Method [RFC6787] STOP Recorder Method [RFC6787] START-INPUT-TIMERS Recorder Method [RFC6787] START-SESSION Verifier Method [RFC6787] END-SESSION Verifier Method [RFC6787] QUERY-VOICEPRINT Verifier Method [RFC6787] DELETE-VOICEPRINT Verifier Method [RFC6787] VERIFY Verifier Method [RFC6787]
Name Resource type Method/Event Reference ---- ------------- ------------ --------- SET-PARAMS Generic Method [RFC6787] GET-PARAMS Generic Method [RFC6787] SPEAK Synthesizer Method [RFC6787] STOP Synthesizer Method [RFC6787] PAUSE Synthesizer Method [RFC6787] RESUME Synthesizer Method [RFC6787] BARGE-IN-OCCURRED Synthesizer Method [RFC6787] CONTROL Synthesizer Method [RFC6787] DEFINE-LEXICON Synthesizer Method [RFC6787] DEFINE-GRAMMAR Recognizer Method [RFC6787] RECOGNIZE Recognizer Method [RFC6787] INTERPRET Recognizer Method [RFC6787] GET-RESULT Recognizer Method [RFC6787] START-INPUT-TIMERS Recognizer Method [RFC6787] STOP Recognizer Method [RFC6787] START-PHRASE-ENROLLMENT Recognizer Method [RFC6787] ENROLLMENT-ROLLBACK Recognizer Method [RFC6787] END-PHRASE-ENROLLMENT Recognizer Method [RFC6787] MODIFY-PHRASE Recognizer Method [RFC6787] DELETE-PHRASE Recognizer Method [RFC6787] RECORD Recorder Method [RFC6787] STOP Recorder Method [RFC6787] START-INPUT-TIMERS Recorder Method [RFC6787] START-SESSION Verifier Method [RFC6787] END-SESSION Verifier Method [RFC6787] QUERY-VOICEPRINT Verifier Method [RFC6787] DELETE-VOICEPRINT Verifier Method [RFC6787] VERIFY Verifier Method [RFC6787]
VERIFY-FROM-BUFFER Verifier Method [RFC6787] VERIFY-ROLLBACK Verifier Method [RFC6787] STOP Verifier Method [RFC6787] START-INPUT-TIMERS Verifier Method [RFC6787] GET-INTERMEDIATE-RESULT Verifier Method [RFC6787] SPEECH-MARKER Synthesizer Event [RFC6787] SPEAK-COMPLETE Synthesizer Event [RFC6787] START-OF-INPUT Recognizer Event [RFC6787] RECOGNITION-COMPLETE Recognizer Event [RFC6787] INTERPRETATION-COMPLETE Recognizer Event [RFC6787] START-OF-INPUT Recorder Event [RFC6787] RECORD-COMPLETE Recorder Event [RFC6787] VERIFICATION-COMPLETE Verifier Event [RFC6787] START-OF-INPUT Verifier Event [RFC6787]
VERIFY-FROM-BUFFER Verifier Method [RFC6787] VERIFY-ROLLBACK Verifier Method [RFC6787] STOP Verifier Method [RFC6787] START-INPUT-TIMERS Verifier Method [RFC6787] GET-INTERMEDIATE-RESULT Verifier Method [RFC6787] SPEECH-MARKER Synthesizer Event [RFC6787] SPEAK-COMPLETE Synthesizer Event [RFC6787] START-OF-INPUT Recognizer Event [RFC6787] RECOGNITION-COMPLETE Recognizer Event [RFC6787] INTERPRETATION-COMPLETE Recognizer Event [RFC6787] START-OF-INPUT Recorder Event [RFC6787] RECORD-COMPLETE Recorder Event [RFC6787] VERIFICATION-COMPLETE Verifier Event [RFC6787] START-OF-INPUT Verifier Event [RFC6787]
IANA has created a new name space of "MRCPv2 Header Fields". All maintenance within and additions to the contents of this name space MUST be according to the "Standards Action" registration policy. The initial contents of the registry, defined by the "message-header" BNF in Section 15 and explained in Section 5.1, are given below. Note that the values permitted for the "Vendor-Specific-Parameters" parameter are managed according to a different policy. See Section 13.1.6.
IANA已经创建了一个新的名称空间“MRCPv2头字段”。此名称空间内容的所有维护和添加必须符合“标准行动”注册政策。注册表的初始内容由第15节中的“消息头”BNF定义,并在第5.1节中解释,如下所示。请注意,“供应商特定参数”参数允许的值根据不同的策略进行管理。见第13.1.6节。
Name Resource type Reference ---- ------------- --------- Channel-Identifier Generic [RFC6787] Accept Generic [RFC2616] Active-Request-Id-List Generic [RFC6787] Proxy-Sync-Id Generic [RFC6787] Accept-Charset Generic [RFC2616] Content-Type Generic [RFC6787] Content-ID Generic [RFC2392], [RFC2046], and [RFC5322] Content-Base Generic [RFC6787] Content-Encoding Generic [RFC6787] Content-Location Generic [RFC6787] Content-Length Generic [RFC6787] Fetch-Timeout Generic [RFC6787] Cache-Control Generic [RFC6787] Logging-Tag Generic [RFC6787] Set-Cookie Generic [RFC6787] Vendor-Specific Generic [RFC6787] Jump-Size Synthesizer [RFC6787] Kill-On-Barge-In Synthesizer [RFC6787] Speaker-Profile Synthesizer [RFC6787]
Name Resource type Reference ---- ------------- --------- Channel-Identifier Generic [RFC6787] Accept Generic [RFC2616] Active-Request-Id-List Generic [RFC6787] Proxy-Sync-Id Generic [RFC6787] Accept-Charset Generic [RFC2616] Content-Type Generic [RFC6787] Content-ID Generic [RFC2392], [RFC2046], and [RFC5322] Content-Base Generic [RFC6787] Content-Encoding Generic [RFC6787] Content-Location Generic [RFC6787] Content-Length Generic [RFC6787] Fetch-Timeout Generic [RFC6787] Cache-Control Generic [RFC6787] Logging-Tag Generic [RFC6787] Set-Cookie Generic [RFC6787] Vendor-Specific Generic [RFC6787] Jump-Size Synthesizer [RFC6787] Kill-On-Barge-In Synthesizer [RFC6787] Speaker-Profile Synthesizer [RFC6787]
Completion-Cause Synthesizer [RFC6787] Completion-Reason Synthesizer [RFC6787] Voice-Parameter Synthesizer [RFC6787] Prosody-Parameter Synthesizer [RFC6787] Speech-Marker Synthesizer [RFC6787] Speech-Language Synthesizer [RFC6787] Fetch-Hint Synthesizer [RFC6787] Audio-Fetch-Hint Synthesizer [RFC6787] Failed-URI Synthesizer [RFC6787] Failed-URI-Cause Synthesizer [RFC6787] Speak-Restart Synthesizer [RFC6787] Speak-Length Synthesizer [RFC6787] Load-Lexicon Synthesizer [RFC6787] Lexicon-Search-Order Synthesizer [RFC6787] Confidence-Threshold Recognizer [RFC6787] Sensitivity-Level Recognizer [RFC6787] Speed-Vs-Accuracy Recognizer [RFC6787] N-Best-List-Length Recognizer [RFC6787] Input-Type Recognizer [RFC6787] No-Input-Timeout Recognizer [RFC6787] Recognition-Timeout Recognizer [RFC6787] Waveform-URI Recognizer [RFC6787] Input-Waveform-URI Recognizer [RFC6787] Completion-Cause Recognizer [RFC6787] Completion-Reason Recognizer [RFC6787] Recognizer-Context-Block Recognizer [RFC6787] Start-Input-Timers Recognizer [RFC6787] Speech-Complete-Timeout Recognizer [RFC6787] Speech-Incomplete-Timeout Recognizer [RFC6787] Dtmf-Interdigit-Timeout Recognizer [RFC6787] Dtmf-Term-Timeout Recognizer [RFC6787] Dtmf-Term-Char Recognizer [RFC6787] Failed-URI Recognizer [RFC6787] Failed-URI-Cause Recognizer [RFC6787] Save-Waveform Recognizer [RFC6787] Media-Type Recognizer [RFC6787] New-Audio-Channel Recognizer [RFC6787] Speech-Language Recognizer [RFC6787] Ver-Buffer-Utterance Recognizer [RFC6787] Recognition-Mode Recognizer [RFC6787] Cancel-If-Queue Recognizer [RFC6787] Hotword-Max-Duration Recognizer [RFC6787] Hotword-Min-Duration Recognizer [RFC6787] Interpret-Text Recognizer [RFC6787] Dtmf-Buffer-Time Recognizer [RFC6787] Clear-Dtmf-Buffer Recognizer [RFC6787] Early-No-Match Recognizer [RFC6787] Num-Min-Consistent-Pronunciations Recognizer [RFC6787]
Completion-Cause Synthesizer [RFC6787] Completion-Reason Synthesizer [RFC6787] Voice-Parameter Synthesizer [RFC6787] Prosody-Parameter Synthesizer