Internet Engineering Task Force (IETF)                      J. Falk, Ed.
Request for Comments: 6590                                   Return Path
Category: Standards Track                              M. Kucherawy, Ed.
ISSN: 2070-1721                                                Cloudmark
                                                              April 2012
        
Internet Engineering Task Force (IETF)                      J. Falk, Ed.
Request for Comments: 6590                                   Return Path
Category: Standards Track                              M. Kucherawy, Ed.
ISSN: 2070-1721                                                Cloudmark
                                                              April 2012
        

Redaction of Potentially Sensitive Data from Mail Abuse Reports

从邮件滥用报告中编辑潜在敏感数据

Abstract

摘要

Email messages often contain information that might be considered private or sensitive, per either regulation or social norms. When such a message becomes the subject of a report intended to be shared with other entities, the report generator may wish to redact or elide the sensitive portions of the message. This memo suggests one method for doing so effectively.

根据法规或社会规范,电子邮件通常包含可能被视为隐私或敏感的信息。当此类消息成为打算与其他实体共享的报告的主题时,报告生成器可能希望编辑或删除消息的敏感部分。这份备忘录提出了一种有效的方法。

Status of This Memo

关于下段备忘

This is an Internet Standards Track document.

这是一份互联网标准跟踪文件。

This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741.

本文件是互联网工程任务组(IETF)的产品。它代表了IETF社区的共识。它已经接受了公众审查,并已被互联网工程指导小组(IESG)批准出版。有关互联网标准的更多信息,请参见RFC 5741第2节。

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6590.

有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc6590.

Copyright Notice

版权公告

Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved.

版权所有(c)2012 IETF信托基金和确定为文件作者的人员。版权所有。

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。

Table of Contents

目录

   1. Introduction ....................................................2
   2. Key Words .......................................................3
   3. Recommended Practice ............................................3
   4. Transformation Mechanisms .......................................4
   5. Security Considerations .........................................5
      5.1. General ....................................................5
      5.2. Digest Collisions ..........................................5
      5.3. Information Not Redacted ...................................5
   6. Privacy Considerations ..........................................6
   7. References ......................................................6
      7.1. Normative References .......................................6
      7.2. Informative References .....................................6
   Appendix A. Example ................................................7
   Appendix B. Acknowledgements .......................................8
        
   1. Introduction ....................................................2
   2. Key Words .......................................................3
   3. Recommended Practice ............................................3
   4. Transformation Mechanisms .......................................4
   5. Security Considerations .........................................5
      5.1. General ....................................................5
      5.2. Digest Collisions ..........................................5
      5.3. Information Not Redacted ...................................5
   6. Privacy Considerations ..........................................6
   7. References ......................................................6
      7.1. Normative References .......................................6
      7.2. Informative References .....................................6
   Appendix A. Example ................................................7
   Appendix B. Acknowledgements .......................................8
        
1. Introduction
1. 介绍

The Abuse Reporting Format [ARF] defines a message format for sending reports of abuse in the messaging infrastructure, with an eye toward automating both the generation and consumption of those reports.

滥用报告格式[ARF]定义了一种用于在消息传递基础设施中发送滥用报告的消息格式,旨在自动化这些报告的生成和使用。

For privacy considerations, it might be the policy of a report generator to anonymize, or obscure, portions of the report that might identify an end user who caused the report to be generated. This has come to be known in feedback loop parlance as "redaction". Precisely how this is done is unspecified in [ARF], as it will generally be a matter of local policy. That specification does admonish generators against being too overzealous with this practice, as obscuring too much data makes the report non-actionable.

出于隐私考虑,报告生成器的策略可能是匿名或隐藏报告中可能识别导致生成报告的最终用户的部分。这在反馈循环中被称为“编校”。在[ARF]中没有具体说明如何做到这一点,因为这通常是当地政策的问题。该规范确实告诫生成器不要过于热衷于这种做法,因为掩盖太多数据会使报告无法执行。

Previous redaction practices, such as replacing local-parts of addresses with a uniform string like "xxxxxxxx", frustrated any kind of prioritizing or grouping of reports. This memo presents a practice for conducting redaction in a manner that allows a report receiver to detect that two reports were caused by the same end user, without revealing the identity of that user. That is, the report receiver can use the redacted string, such as an obscured email address, to determine that two such unredacted strings were identical; the reports originally contained the same address.

以前的编校实践,如用统一字符串(如“xxxxxxxx”)替换地址的本地部分,阻碍了任何类型的报告优先级排序或分组。本备忘录介绍了以一种方式进行编辑的实践,该方式允许报告接收者检测到两份报告是由同一最终用户引起的,而无需透露该用户的身份。也就是说,报告接收者可以使用经过编辑的字符串,例如模糊的电子邮件地址,来确定两个这样的未经编辑的字符串是相同的;这些报告最初载有同一地址。

Generally, it is assumed that the recipient-identifying fields of a message, when copied into a report, are to be obscured to protect the identity of the end user who submitted the complaint about the message. However, it is also presumed that other data will be left intact, and those data could be correlated against log files or other resources to determine the intended recipient of the original message.

通常,假定在复制到报告中时,邮件的收件人标识字段将被隐藏,以保护提交邮件投诉的最终用户的身份。但是,还假定其他数据将保持不变,并且这些数据可以与日志文件或其他资源关联,以确定原始消息的预期收件人。

2. Key Words
2. 关键词

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [KEYWORDS].

本文件中的关键词“必须”、“不得”、“必需”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照[关键词]中所述进行解释。

3. Recommended Practice
3. 推荐做法

When redacting of reports is desired, in order to enable a report receiver to correlate reports that might refer to a common but anonymous source, the report generator SHOULD use the following practice:

当需要对报告进行编校时,为了使报告接收者能够关联可能引用公共但匿名来源的报告,报告生成器应采用以下做法:

1. Select a transformation mechanism (see Section 4) that is consistent (i.e., the same input string produces the same output each time) and reasonably collision-resistant (i.e., two different inputs are unlikely to produce the same output).

1. 选择一致(即,相同的输入字符串每次产生相同的输出)和合理的抗冲突(即,两个不同的输入不太可能产生相同的输出)的转换机制(参见第4节)。

2. Identify string(s) (such as local-parts of email addresses) in a message that need to be redacted. Call these strings the "private data".

2. 识别邮件中需要编辑的字符串(如电子邮件地址的本地部分)。将这些字符串称为“私有数据”。

3. For each piece of private data, apply the selected transformation mechanism.

3. 对于每段私有数据,应用选定的转换机制。

4. If the output of the transformation can contain bytes that are not printable ASCII, or if the output can include characters not appropriate to replace the private data directly, encode the output with the base64 algorithm as defined in Section 4 of [BASE64], or some similar translation, to form a valid replacement in the original context. For example, replacing a local-part in an email address with transformation output containing an "@" character (ASCII 0x40) or a space character (ASCII 0x20) is not permitted by the specification for local-part [SMTP], so the transformation output needs to be encoded as described.

4. 如果转换的输出可能包含不可打印的ASCII字节,或者如果输出可能包含不适合直接替换私有数据的字符,则使用[base64]第4节中定义的base64算法或类似翻译对输出进行编码,以在原始上下文中形成有效替换。例如,本地部分[SMTP]规范不允许将电子邮件地址中的本地部分替换为包含“@”字符(ASCII 0x40)或空格字符(ASCII 0x20)的转换输出,因此需要按照所述对转换输出进行编码。

5. Replace each instance of private data with the corresponding (possibly encoded) transformation when generating the report. Note that the replaced text could also be in a context that has constraints, such as length limits that need to be observed.

5. 生成报告时,用相应的(可能编码的)转换替换私有数据的每个实例。请注意,被替换的文本也可能位于具有约束的上下文中,例如需要遵守的长度限制。

This has the effect of obscuring the data (in a potentially irreversible way) while still allowing the report recipient to observe that numerous reports are about one particular end user. Such detection enables the receiver to prioritize its reactions based on problems that appear to be focused on specific end users that may be under attack.

这会使数据变得模糊(以一种可能不可逆转的方式),同时仍然允许报告接收者观察到许多报告都是关于一个特定的最终用户的。这种检测使接收者能够根据似乎集中在可能受到攻击的特定终端用户上的问题对其反应进行优先级排序。

4. Transformation Mechanisms
4. 转化机制

This memo does not specify a particular transformation mechanism as a requirement. The interoperability that this memo seeks to provide is enabled by the consistency of the transformation.

本备忘录未将特定转换机制指定为要求。本备忘录试图提供的互操作性是由转换的一致性实现的。

Dealing with the issue of the security of the transformation (i.e., frustrating attempts to reverse the transformation) is a matter of local policy. A continuum of possible transformations exists, from trivial ones such as rot13, CRC32, and base64, through strong cryptographic encodings such as the Hashed Message Authentication Code [HMAC] and even full encryption, or private transformations such as mapping an email address to an internal customer number. An operator wishing to perform report redaction needs to select a consistent transformation that obscures the private data and is resilient to attempts to extract the original data to the extent required by local policy, keeping in mind that the environment in which the transformation is operating is not a highly secure one. See Section 5.3 for further details of this issue.

处理转型的安全性问题(即逆转转型的失败尝试)是一个地方政策问题。存在一系列可能的转换,从普通转换(如rot13、CRC32和base64),到强加密编码(如散列消息身份验证码[HMAC]),甚至完全加密,或私有转换(如将电子邮件地址映射到内部客户号)。希望执行报告编校的运营商需要选择一个一致的转换,该转换将隐藏私有数据,并且能够在本地策略要求的范围内灵活地尝试提取原始数据,同时要记住,转换运行的环境不是高度安全的环境。有关此问题的更多详细信息,请参见第5.3节。

An implementation MAY choose any transformation that has a reasonably low likelihood of collision.

实现可以选择任何冲突可能性相当低的转换。

5. Security Considerations
5. 安全考虑
5.1. General
5.1. 全体的

General security issues with respect to these reports are found in [ARF].

与这些报告相关的一般安全问题见[ARF]。

5.2. Digest Collisions
5.2. 消化碰撞

Message digest collisions are a well-understood issue. Their application here involves a report receiver improperly concluding that two pieces of redacted information were originally the same when in fact they are not. This can lead to a denial of service, where the inadvertently improper application of complaint data causes unjustified corrective action. Such cases are sufficiently unlikely as to be of little concern.

消息摘要冲突是一个众所周知的问题。他们在这里的应用涉及到一个报告接收者错误地得出结论,即两条经过编辑的信息原本是相同的,而事实上并非如此。这可能会导致拒绝服务,因为无意中不当应用投诉数据会导致不合理的纠正措施。这种情况不太可能发生,因此不太值得关注。

5.3. Information Not Redacted
5.3. 未编辑的信息

Although the identity of the user causing a report to be generated can be obscured using this mechanism, other properties of a message (such as the Message-ID field) that are not redacted could be used to recover the original data by locating them in the message logs of the originating system or via other data correlation techniques. It is incumbent on the report generator to anticipate and redact or otherwise obscure such data, or accept that such recovery is possible even from the very simplest kinds of feedback.

尽管使用该机制可以模糊导致生成报告的用户的身份,但是未经编辑的消息的其他属性(例如消息ID字段)可以通过在发起系统的消息日志中定位原始数据或通过其他数据关联技术来用于恢复原始数据。报告生成器有责任预测、编辑或以其他方式隐藏此类数据,或接受即使是最简单的反馈也可以进行此类恢复。

It is for this reason that the normative portions of this memo do not include stronger assertions about cryptography used in the transformation. Given the ultimate recoverability of the redacted information, the cryptographic strength of the transformation is not a critical security measure.

正是由于这个原因,本备忘录的规范性部分没有包含关于转换中使用的加密的更有力的断言。考虑到编辑信息的最终可恢复性,转换的加密强度不是一个关键的安全措施。

The process of redacting a feedback report satisfies a privacy requirement established by local policy, and is not meant to provide strong security properties.

编辑反馈报告的过程满足当地政策制定的隐私要求,并不意味着提供强大的安全属性。

[FBL-BCP] and Section 8 of [ARF] discuss topics related to establishment of bilateral agreements between report producers and consumers. The issues raised here are also things to be considered when establishing such agreements.

[FBL-BCP]和[ARF]第8节讨论了与报告生产者和消费者之间建立双边协议相关的主题。这里提出的问题也是制定此类协议时需要考虑的问题。

6. Privacy Considerations
6. 隐私考虑

While the method of redaction described in this document may reduce the likelihood of some types of private data from leaking between ADministrative Management Domains (ADMDs), it is extremely unlikely that report generation software could ever be created to recognize all of the different ways that private information could be expressed through human written language. If further protections are required, implementers may wish to consider establishing some sort of out-of-band arrangements between the relevant entities, to contain private data as much as possible.

虽然本文档中描述的编校方法可能会降低某些类型的私有数据在管理管理域(ADMD)之间泄漏的可能性,创建报告生成软件来识别私人信息通过人类书面语言表达的所有不同方式是极不可能的。如果需要进一步的保护,实施者可能希望考虑在相关实体之间建立某种带外安排,尽可能地包含私有数据。

7. References
7. 工具书类
7.1. Normative References
7.1. 规范性引用文件

[ARF] Shafranovich, Y., Levine, J., and M. Kucherawy, "An Extensible Format for Email Feedback Reports", RFC 5965, August 2010.

[ARF]Shafranovich,Y.,Levine,J.,和M.Kucherawy,“电子邮件反馈报告的可扩展格式”,RFC 59652010年8月。

[BASE64] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, October 2006.

[BASE64]Josefsson,S.,“Base16、Base32和BASE64数据编码”,RFC4648,2006年10月。

[KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[关键词]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。

7.2. Informative References
7.2. 资料性引用

[FBL-BCP] Falk, J., Ed., "Complaint Feedback Loop Operational Recommendations", RFC 6449, November 2011.

[FBL-BCP]Falk,J.,Ed.,“投诉反馈回路操作建议”,RFC 6449,2011年11月。

[HMAC] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-Hashing for Message Authentication", RFC 2104, February 1997.

[HMAC]Krawczyk,H.,Bellare,M.,和R.Canetti,“HMAC:用于消息身份验证的键控哈希”,RFC 2104,1997年2月。

[SMTP] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, October 2008.

[SMTP]Klensin,J.,“简单邮件传输协议”,RFC 53212008年10月。

Appendix A. Example
附录A.示例

Assume the following input message:

假设以下输入消息:

     From: alice@example.com
     To: bob@example.net
     Subject: Make money fast!
     Message-ID: <123456789@mailer.example.com>
     Date: Thu, 17 Nov 2011 22:19:40 -0500
        
     From: alice@example.com
     To: bob@example.net
     Subject: Make money fast!
     Message-ID: <123456789@mailer.example.com>
     Date: Thu, 17 Nov 2011 22:19:40 -0500
        
     Want to make a lot of money really fast?  Check it out!
     http://www.example.com/scam/0xd0d0cafe
        
     Want to make a lot of money really fast?  Check it out!
     http://www.example.com/scam/0xd0d0cafe
        

On receipt, bob@example.net reports this message as abusive through whatever mechanism his mailbox provider has established. This causes an [ARF] message to be generated. However, example.net wishes to obscure Bob's email address lest it be relayed to the offending agent, which could lead to more trouble for Bob.

收到后,bob@example.net通过邮箱提供商建立的任何机制将此邮件报告为滥用。这导致生成[ARF]消息。但是,example.net希望隐藏Bob的电子邮件地址,以免将其转发给违规代理,这可能会给Bob带来更多麻烦。

Thus, example.net plans to redact the local-part of the recipient address in the To: field. Local policy and security requirements suggest that the algorithm known as "H" (a hash of a key concatenated with the data to be obscured) using SHA1 is adequate. It has thus selected a redaction key of "potatoes", and the private data in this case is the string "bob". The concatenation of "potatoesbob" is digested with SHA1 and then base64-encoded to the string "rZ8cqXWGiKHzhz1MsFRGTysHia4=".

因此,example.net计划在to:字段中对收件人地址的本地部分进行修订。本地策略和安全要求表明,使用SHA1的称为“H”(与要隐藏的数据连接的密钥散列)的算法是足够的。因此,它选择了一个编校键“potatos”,本例中的私有数据是字符串“bob”。“potatoesbob”的串联用SHA1进行分解,然后base64编码为字符串“rZ8cqXWGiKHzhz1MsFRGTysHia4=”。

Therefore, when constructing the ARF message in response to Bob's complaint, the following form of the received message is used in the third part of the ARF report:

因此,在响应Bob的投诉构建ARF消息时,ARF报告的第三部分使用以下形式的接收消息:

     From: alice@example.com
     To: rZ8cqXWGiKHzhz1MsFRGTysHia4=@example.net
     Subject: Make money fast!
     Message-ID: <123456789@mailer.example.com>
     Date: Thu, 17 Nov 2011 22:19:40 -0500
        
     From: alice@example.com
     To: rZ8cqXWGiKHzhz1MsFRGTysHia4=@example.net
     Subject: Make money fast!
     Message-ID: <123456789@mailer.example.com>
     Date: Thu, 17 Nov 2011 22:19:40 -0500
        
     Want to make a lot of money really fast?  Check it out!
     http://www.example.com/scam/0xd0d0cafe
        
     Want to make a lot of money really fast?  Check it out!
     http://www.example.com/scam/0xd0d0cafe
        

Note, however, that it is possible that the redacted information can be recovered by agents at example.com searching their logs for the original envelope associated with the message, by correlating with the Message-ID contents, which were not redacted here. It is expected that feedback loops generating such reports involve senders that have been vetted against such information leakage.

但是,请注意,example.com上的代理可以通过与此处未编辑的消息ID内容相关联的方式,在其日志中搜索与消息关联的原始信封,从而恢复已编辑的信息。预计生成此类报告的反馈回路涉及已针对此类信息泄漏进行审查的发送者。

Appendix B. Acknowledgements
附录B.确认书

Much of the text in this document was initially moved from other MARF working group documents, with contributions from Monica Chew, Tim Draegen, Michael Adkins, and other members of the Messaging Anti-Abuse Working Group. Additional feedback was provided by John Levine, S. Moonesamy, Alessandro Vesely, and Mykyta Yevstifeyev.

本文件中的大部分文本最初是从MARF工作组的其他文件中转移过来的,由Monica Chew、Tim Draegen、Michael Adkins和消息传递反虐待工作组的其他成员提供。John Levine、S.Moonesay、Alessandro Vesely和Mykyta Yevstifeyev提供了其他反馈。

Authors' Addresses

作者地址

J.D. Falk (editor) Return Path 100 Mathilda Place, Suite 100 Sunnyvale, CA 94086 US

J.D.福尔克(编辑)美国加利福尼亚州桑尼维尔市玛蒂尔达广场100号100室返回路径94086

   EMail: ietf@cybernothing.org
   URI:   http://www.returnpath.net/
        
   EMail: ietf@cybernothing.org
   URI:   http://www.returnpath.net/
        

M. Kucherawy (editor) Cloudmark 128 King St., 2nd Floor San Francisco, CA 94107 US

M. Kucherawy(编辑)CuldMax 128国王圣,第二楼旧金山,CA 94107美国

   EMail: msk@cloudmark.com
        
   EMail: msk@cloudmark.com