Network Working Group                                  S. Josefsson, Ed.
Request for Comments: 3548                                     July 2003
Category: Informational
        
Network Working Group                                  S. Josefsson, Ed.
Request for Comments: 3548                                     July 2003
Category: Informational
        

The Base16, Base32, and Base64 Data Encodings

Base16、Base32和Base64数据编码

Status of this Memo

本备忘录的状况

This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.

本备忘录为互联网社区提供信息。它没有规定任何类型的互联网标准。本备忘录的分发不受限制。

Copyright Notice

版权公告

Copyright (C) The Internet Society (2003). All Rights Reserved.

版权所有(C)互联网协会(2003年)。版权所有。

Abstract

摘要

This document describes the commonly used base 64, base 32, and base 16 encoding schemes. It also discusses the use of line-feeds in encoded data, use of padding in encoded data, use of non-alphabet characters in encoded data, and use of different encoding alphabets.

本文档介绍了常用的base 64、base 32和base 16编码方案。还讨论了在编码数据中使用换行、在编码数据中使用填充、在编码数据中使用非字母字符以及使用不同的编码字母。

Table of Contents

目录

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  2
   2.  Implementation discrepancies . . . . . . . . . . . . . . . . .  2
       2.1.  Line feeds in encoded data . . . . . . . . . . . . . . .  2
       2.2.  Padding of encoded data  . . . . . . . . . . . . . . . .  3
       2.3.  Interpretation of non-alphabet characters in encoded
             data . . . . . . . . . . . . . . . . . . . . . . . . . .  3
       2.4.  Choosing the alphabet  . . . . . . . . . . . . . . . . .  3
   3.  Base 64 Encoding . . . . . . . . . . . . . . . . . . . . . . .  4
   4.  Base 64 Encoding with URL and Filename Safe Alphabet . . . . .  6
   5.  Base 32 Encoding . . . . . . . . . . . . . . . . . . . . . . .  6
   6.  Base 16 Encoding . . . . . . . . . . . . . . . . . . . . . . .  8
   7.  Illustrations and examples . . . . . . . . . . . . . . . . . .  9
   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 10
   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
       9.1.  Normative References . . . . . . . . . . . . . . . . . . 11
       9.2.  Informative References . . . . . . . . . . . . . . . . . 11
   10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11
   11. Editor's Address . . . . . . . . . . . . . . . . . . . . . . . 12
   12. Full Copyright Statement . . . . . . . . . . . . . . . . . . . 13
        
   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  2
   2.  Implementation discrepancies . . . . . . . . . . . . . . . . .  2
       2.1.  Line feeds in encoded data . . . . . . . . . . . . . . .  2
       2.2.  Padding of encoded data  . . . . . . . . . . . . . . . .  3
       2.3.  Interpretation of non-alphabet characters in encoded
             data . . . . . . . . . . . . . . . . . . . . . . . . . .  3
       2.4.  Choosing the alphabet  . . . . . . . . . . . . . . . . .  3
   3.  Base 64 Encoding . . . . . . . . . . . . . . . . . . . . . . .  4
   4.  Base 64 Encoding with URL and Filename Safe Alphabet . . . . .  6
   5.  Base 32 Encoding . . . . . . . . . . . . . . . . . . . . . . .  6
   6.  Base 16 Encoding . . . . . . . . . . . . . . . . . . . . . . .  8
   7.  Illustrations and examples . . . . . . . . . . . . . . . . . .  9
   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 10
   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
       9.1.  Normative References . . . . . . . . . . . . . . . . . . 11
       9.2.  Informative References . . . . . . . . . . . . . . . . . 11
   10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11
   11. Editor's Address . . . . . . . . . . . . . . . . . . . . . . . 12
   12. Full Copyright Statement . . . . . . . . . . . . . . . . . . . 13
        
1. Introduction
1. 介绍

Base encoding of data is used in many situations to store or transfer data in environments that, perhaps for legacy reasons, are restricted to only US-ASCII [9] data. Base encoding can also be used in new applications that do not have legacy restrictions, simply because it makes it possible to manipulate objects with text editors.

在许多情况下,数据的基本编码用于在仅限于US-ASCII[9]数据的环境中存储或传输数据,这可能是由于遗留原因。基本编码也可以在没有遗留限制的新应用程序中使用,这仅仅是因为它可以使用文本编辑器操纵对象。

In the past, different applications have had different requirements and thus sometimes implemented base encodings in slightly different ways. Today, protocol specifications sometimes use base encodings in general, and "base64" in particular, without a precise description or reference. MIME [3] is often used as a reference for base64 without considering the consequences for line-wrapping or non-alphabet characters. The purpose of this specification is to establish common alphabet and encoding considerations. This will hopefully reduce ambiguity in other documents, leading to better interoperability.

在过去,不同的应用程序有不同的需求,因此有时以稍微不同的方式实现基本编码。今天,协议规范有时通常使用基本编码,特别是“base64”,而没有精确的描述或引用。MIME[3]通常用作base64的参考,而不考虑换行或非字母字符的后果。本规范旨在确定通用字母表和编码注意事项。这有望减少其他文档中的歧义,从而实现更好的互操作性。

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [1].

本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照RFC 2119[1]中所述进行解释。

2. Implementation discrepancies
2. 执行差异

Here we discuss the discrepancies between base encoding implementations in the past, and where appropriate, mandate a specific recommended behavior for the future.

在这里,我们讨论过去的基本编码实现之间的差异,并在适当的情况下,为未来指定特定的推荐行为。

2.1. Line feeds in encoded data
2.1. 编码数据中的换行

MIME [3] is often used as a reference for base 64 encoding. However, MIME does not define "base 64" per se, but rather a "base 64 Content-Transfer-Encoding" for use within MIME. As such, MIME enforces a limit on line length of base 64 encoded data to 76 characters. MIME inherits the encoding from PEM [2] stating it is "virtually identical", however PEM uses a line length of 64 characters. The MIME and PEM limits are both due to limits within SMTP.

MIME[3]通常用作Base64编码的参考。然而,MIME本身并没有定义“base64”,而是定义了一种在MIME中使用的“base64内容传输编码”。同样地,MIME将base 64编码数据的行长度限制为76个字符。MIME继承了PEM[2]的编码,称其“几乎相同”,但PEM使用64个字符的行长。MIME和PEM限制都是由于SMTP中的限制造成的。

Implementations MUST NOT not add line feeds to base encoded data unless the specification referring to this document explicitly directs base encoders to add line feeds after a specific number of characters.

除非参考本文档的规范明确指示基本编码器在特定数量的字符后添加换行符,否则实现不得向基本编码数据添加换行符。

2.2. Padding of encoded data
2.2. 编码数据的填充

In some circumstances, the use of padding ("=") in base encoded data is not required nor used. In the general case, when assumptions on size of transported data cannot be made, padding is required to yield correct decoded data.

在某些情况下,不需要也不使用基址编码数据中的填充(“=”)。在一般情况下,当无法对传输数据的大小进行假设时,需要填充以产生正确的解码数据。

Implementations MUST include appropriate pad characters at the end of encoded data unless the specification referring to this document explicitly states otherwise.

实现必须在编码数据的末尾包含适当的pad字符,除非引用本文档的规范明确规定了其他内容。

2.3. Interpretation of non-alphabet characters in encoded data
2.3. 编码数据中非字母字符的解释

Base encodings use a specific, reduced, alphabet to encode binary data. Non alphabet characters could exist within base encoded data, caused by data corruption or by design. Non alphabet characters may be exploited as a "covert channel", where non-protocol data can be sent for nefarious purposes. Non alphabet characters might also be sent in order to exploit implementation errors leading to, e.g., buffer overflow attacks.

基本编码使用特定的简化字母表对二进制数据进行编码。由于数据损坏或设计原因,非字母字符可能存在于基本编码数据中。非字母字符可能被利用为“隐蔽通道”,其中非协议数据可能被发送用于恶意目的。还可能发送非字母字符,以利用导致缓冲区溢出攻击等实现错误。

Implementations MUST reject the encoding if it contains characters outside the base alphabet when interpreting base encoded data, unless the specification referring to this document explicitly states otherwise. Such specifications may, as MIME does, instead state that characters outside the base encoding alphabet should simply be ignored when interpreting data ("be liberal in what you accept"). Note that this means that any CRLF constitute "non alphabet characters" and are ignored. Furthermore, such specifications may consider the pad character, "=", as not part of the base alphabet until the end of the string. If more than the allowed number of pad characters are found at the end of the string, e.g., a base 64 string terminated with "===", the excess pad characters could be ignored.

在解释基本编码数据时,如果编码包含基本字母表之外的字符,则实现必须拒绝该编码,除非引用本文档的规范明确规定了其他内容。这些规范可能会像MIME一样,声明在解释数据时应忽略基本编码字母表之外的字符(“在您接受的内容上要自由”)。请注意,这意味着任何CRLF都构成“非字母字符”,将被忽略。此外,这样的规范可以考虑PAD字符“=”,而不是基字母表的一部分,直到字符串的结尾。如果在字符串末尾发现超过允许数量的填充字符,例如以“==”结尾的基64字符串,则可以忽略多余的填充字符。

2.4. Choosing the alphabet
2.4. 选择字母表

Different applications have different requirements on the characters in the alphabet. Here are a few requirements that determine which alphabet should be used:

不同的应用程序对字母表中的字符有不同的要求。以下是确定应使用哪个字母表的一些要求:

o Handled by humans. Characters "0", "O" are easily interchanged, as well "1", "l" and "I". In the base32 alphabet below, where 0 (zero) and 1 (one) is not present, a decoder may interpret 0 as O, and 1 as I or L depending on case. (However, by default it should not, see previous section.)

o 由人类操纵。字符“0”、“O”以及“1”、“l”和“I”易于互换。在下面的base32字母表中,如果0(零)和1(一)不存在,解码器可能会根据情况将0解释为O,将1解释为I或L。(但是,默认情况下不应如此,请参见上一节。)

o Encoded into structures that place other requirements. For base 16 and base 32, this determines the use of upper- or lowercase alphabets. For base 64, the non-alphanumeric characters (in particular "/") may be problematic in file names and URLs.

o 编码到放置其他需求的结构中。对于基数16和基数32,这决定了使用大写或小写字母。对于base 64,文件名和URL中的非字母数字字符(尤其是“/”)可能有问题。

o Used as identifiers. Certain characters, notably "+" and "/" in the base 64 alphabet, are treated as word-breaks by legacy text search/index tools.

o 用作标识符。某些字符,尤其是64进制字母表中的“+”和“/”被传统的文本搜索/索引工具视为分词。

There is no universally accepted alphabet that fulfills all the requirements. In this document, we document and name some currently used alphabets.

没有一种普遍接受的字母表能够满足所有的要求。在本文档中,我们记录并命名了一些当前使用的字母表。

3. Base 64 Encoding
3. 基64编码

The following description of base 64 is due to [2], [3], [4] and [5].

下面对base 64的描述是由于[2]、[3]、[4]和[5]引起的。

The Base 64 encoding is designed to represent arbitrary sequences of octets in a form that requires case sensitivity but need not be humanly readable.

Base64编码设计用于以需要区分大小写但不需要人类可读的形式表示任意八位字节序列。

A 65-character subset of US-ASCII is used, enabling 6 bits to be represented per printable character. (The extra 65th character, "=", is used to signify a special processing function.)

使用US-ASCII的65个字符子集,使每个可打印字符能够表示6位。(额外的第65个字符“=”用于表示特殊处理功能。)

The encoding process represents 24-bit groups of input bits as output strings of 4 encoded characters. Proceeding from left to right, a 24-bit input group is formed by concatenating 3 8-bit input groups. These 24 bits are then treated as 4 concatenated 6-bit groups, each of which is translated into a single digit in the base 64 alphabet.

编码过程将输入位的24位组表示为4个编码字符的输出字符串。从左到右,通过连接3个8位输入组形成24位输入组。然后,这24位被视为4个串联的6位组,每个组被转换为64位字母表中的一个数字。

Each 6-bit group is used as an index into an array of 64 printable characters. The character referenced by the index is placed in the output string.

每个6位组用作64个可打印字符数组的索引。索引引用的字符被放置在输出字符串中。

Table 1: The Base 64 Alphabet

表1:基本64字母表

Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 + 12 M 29 d 46 u 63 / 13 N 30 e 47 v 14 O 31 f 48 w (pad) = 15 P 32 g 49 x 16 Q 33 h 50 y

数值编码数值编码数值编码数值编码数值编码0a17r34i51z1b18s35j520c19t36k531d20u37l542e21v21v21v38m5535f22w39n5646g23x40o577h24y41p5868i25z42q597j26a43r60810k27b6111l28t62+12m29d46u63/13n30e47v31f48w(pad)=15 P 32 g 49 x 16 Q 33 h 50 y

Special processing is performed if fewer than 24 bits are available at the end of the data being encoded. A full encoding quantum is always completed at the end of a quantity. When fewer than 24 input bits are available in an input group, zero bits are added (on the right) to form an integral number of 6-bit groups. Padding at the end of the data is performed using the '=' character. Since all base 64 input is an integral number of octets, only the following cases can arise:

如果在编码的数据末尾可用的位少于24位,则执行特殊处理。一个完整的编码量总是在一个量的末尾完成。当一个输入组中可用的输入位少于24位时,将添加零位(右侧)以形成整数个6位组。数据末尾的填充使用“=”字符执行。由于所有base 64输入都是八位字节的整数,因此只能出现以下情况:

(1) the final quantum of encoding input is an integral multiple of 24 bits; here, the final unit of encoded output will be an integral multiple of 4 characters with no "=" padding,

(1) 编码输入的最终量是24位的整数倍;这里,编码输出的最终单位将是4个字符的整数倍,没有“=”填充,

(2) the final quantum of encoding input is exactly 8 bits; here, the final unit of encoded output will be two characters followed by two "=" padding characters, or

(2) 编码输入的最终量正好是8位;这里,编码输出的最终单位是两个字符,后跟两个“=”填充字符,或

(3) the final quantum of encoding input is exactly 16 bits; here, the final unit of encoded output will be three characters followed by one "=" padding character.

(3) 编码输入的最终量正好是16位;这里,编码输出的最终单位是三个字符,后跟一个“=”填充字符。

4. Base 64 Encoding with URL and Filename Safe Alphabet
4. 使用URL和文件名安全字母表的Base 64编码

The Base 64 encoding with an URL and filename safe alphabet has been used in [8].

[8]中使用了带有URL和文件名安全字母表的Base 64编码。

An alternative alphabet has been suggested that used "~" as the 63rd character. Since the "~" character has special meaning in some file system environments, the encoding described in this section is recommended instead.

有人建议另一种字母表使用“~”作为第63个字符。由于“~”字符在某些文件系统环境中具有特殊意义,因此建议改用本节中描述的编码。

This encoding should not be regarded as the same as the "base64" encoding, and should not be referred to as only "base64". Unless made clear, "base64" refer to the base 64 in the previous section.

此编码不应视为与“base64”编码相同,也不应仅称为“base64”。除非明确说明,“base64”指上一节中的base64。

This encoding is technically identical to the previous one, except for the 62:nd and 63:rd alphabet character, as indicated in table 2.

除表2中所示的62:nd和63:rd字母字符外,该编码在技术上与前一种编码相同。

Table 2: The "URL and Filename safe" Base 64 Alphabet

表2:“URL和文件名安全”基本64个字母

Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 - (minus) 12 M 29 d 46 u 63 _ (understrike) 13 N 30 e 47 v 14 O 31 f 48 w (pad) = 15 P 32 g 49 x 16 Q 33 h 50 y

数值编码数值编码数值编码数值编码数值编码0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 H 24 Y 41 p 58 6 8 i 25 z 42 q 59 7 9 j 26 A 43 R 60 8 10 k 27 B 44 S 61 11 l 28 C 45 T 62-(负)12 m 29 D 46 U 63(攻击下)13N30E47V31F48W(焊盘)=15P32G49X16Q33H50Y

5. Base 32 Encoding
5. 基32编码

The following description of base 32 is due to [7] (with corrections).

下面对基数32的描述是由于[7](带更正)。

The Base 32 encoding is designed to represent arbitrary sequences of octets in a form that needs to be case insensitive but need not be humanly readable.

Base 32编码设计用于表示任意八位字节序列,其形式需要不区分大小写,但不需要人类可读。

A 33-character subset of US-ASCII is used, enabling 5 bits to be represented per printable character. (The extra 33rd character, "=", is used to signify a special processing function.)

使用US-ASCII的33个字符子集,使每个可打印字符能够表示5位。(额外的第33个字符“=”用于表示特殊处理功能。)

The encoding process represents 40-bit groups of input bits as output strings of 8 encoded characters. Proceeding from left to right, a 40-bit input group is formed by concatenating 5 8bit input groups. These 40 bits are then treated as 8 concatenated 5-bit groups, each of which is translated into a single digit in the base 32 alphabet. When encoding a bit stream via the base 32 encoding, the bit stream must be presumed to be ordered with the most-significant-bit first. That is, the first bit in the stream will be the high-order bit in the first 8bit byte, and the eighth bit will be the low-order bit in the first 8bit byte, and so on.

编码过程将40位输入位组表示为8个编码字符的输出字符串。从左到右,通过连接5个8位输入组形成40位输入组。然后,这40位被视为8个串联的5位组,每个组被转换为32个基本字母表中的一个数字。当通过基32编码对比特流进行编码时,必须假定比特流以最高有效位优先排序。也就是说,流中的第一位将是前8位字节中的高位,第八位将是前8位字节中的低位,依此类推。

Each 5-bit group is used as an index into an array of 32 printable characters. The character referenced by the index is placed in the output string. These characters, identified in Table 2, below, are selected from US-ASCII digits and uppercase letters.

每个5位组用作32个可打印字符数组的索引。索引引用的字符被放置在输出字符串中。这些字符(见下表2)选自US-ASCII数字和大写字母。

Table 3: The Base 32 Alphabet

表3:32个基本字母

Value Encoding Value Encoding Value Encoding Value Encoding 0 A 9 J 18 S 27 3 1 B 10 K 19 T 28 4 2 C 11 L 20 U 29 5 3 D 12 M 21 V 30 6 4 E 13 N 22 W 31 7 5 F 14 O 23 X 6 G 15 P 24 Y (pad) = 7 H 16 Q 25 Z 8 I 17 R 26 2

数值编码数值编码数值编码数值编码数值编码0a9j18s2731b10k19t2842c11l20u295d12m21v3064e13n22w3175f14o23x6g15p24y(pad)=7h16q25z8i17r262

Special processing is performed if fewer than 40 bits are available at the end of the data being encoded. A full encoding quantum is always completed at the end of a body. When fewer than 40 input bits are available in an input group, zero bits are added (on the right) to form an integral number of 5-bit groups. Padding at the end of the data is performed using the "=" character. Since all base 32 input is an integral number of octets, only the following cases can arise:

如果在编码数据的末尾可用的比特数少于40,则执行特殊处理。一个完整的编码量总是在主体的末尾完成。当一个输入组中可用的输入位少于40位时,将添加零位(右侧)以形成整数个5位组。数据末尾的填充使用“=”字符执行。由于所有基32输入都是八位字节的整数,因此只能出现以下情况:

(1) the final quantum of encoding input is an integral multiple of 40 bits; here, the final unit of encoded output will be an integral multiple of 8 characters with no "=" padding,

(1) 编码输入的最终量是40位的整数倍;这里,编码输出的最终单位将是8个字符的整数倍,没有“=”填充,

(2) the final quantum of encoding input is exactly 8 bits; here, the final unit of encoded output will be two characters followed by six "=" padding characters,

(2) 编码输入的最终量正好是8位;这里,编码输出的最终单位是两个字符,后跟六个“=”填充字符,

(3) the final quantum of encoding input is exactly 16 bits; here, the final unit of encoded output will be four characters followed by four "=" padding characters,

(3) 编码输入的最终量正好是16位;这里,编码输出的最终单位是四个字符,后跟四个“=”填充字符,

(4) the final quantum of encoding input is exactly 24 bits; here, the final unit of encoded output will be five characters followed by three "=" padding characters, or

(4) 编码输入的最终量正好是24位;这里,编码输出的最终单位是五个字符,后跟三个“=”填充字符,或

(5) the final quantum of encoding input is exactly 32 bits; here, the final unit of encoded output will be seven characters followed by one "=" padding character.

(5) 编码输入的最终量正好是32位;这里,编码输出的最终单位是七个字符,后跟一个“=”填充字符。

6. Base 16 Encoding
6. 基16编码

The following description is original but analogous to previous descriptions. Essentially, Base 16 encoding is the standard standard case insensitive hex encoding, and may be referred to as "base16" or "hex".

以下描述为原始描述,但与之前的描述类似。基本上,Base 16编码是标准的不区分大小写的十六进制编码,可以称为“Base 16”或“hex”。

A 16-character subset of US-ASCII is used, enabling 4 bits to be represented per printable character.

使用US-ASCII的16个字符子集,使每个可打印字符能够表示4位。

The encoding process represents 8-bit groups (octets) of input bits as output strings of 2 encoded characters. Proceeding from left to right, a 8-bit input is taken from the input data. These 8 bits are then treated as 2 concatenated 4-bit groups, each of which is translated into a single digit in the base 16 alphabet.

编码过程将输入位的8位组(八位字节)表示为2个编码字符的输出字符串。从左到右,从输入数据中获取8位输入。然后将这8位视为2个串联的4位组,每个组被转换为16进制字母表中的单个数字。

Each 4-bit group is used as an index into an array of 16 printable characters. The character referenced by the index is placed in the output string.

每个4位组用作16个可打印字符数组的索引。索引引用的字符被放置在输出字符串中。

Table 5: The Base 16 Alphabet

表5:基本16个字母

Value Encoding Value Encoding Value Encoding Value Encoding 0 0 4 4 8 8 12 C 1 1 5 5 9 9 13 D 2 2 6 6 10 A 14 E 3 3 7 7 11 B 15 F

值编码值编码值编码值编码0 0 4 4 8 8 12 C 1 5 5 9 9 13 D 2 6 6 10 A 14 E 3 7 11 B 15 F

Unlike base 32 and base 64, no special padding is necessary since a full code word is always available.

与base 32和base 64不同,由于完整的码字始终可用,因此不需要特殊的填充。

7. Illustrations and examples
7. 插图和例子

To translate between binary and a base encoding, the input is stored in a structure and the output is extracted. The case for base 64 is displayed in the following figure, borrowed from [4].

为了在二进制编码和基编码之间转换,输入存储在一个结构中,输出被提取。下图显示了base 64的案例,借用自[4]。

         +--first octet--+-second octet--+--third octet--+
         |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|
         +-----------+---+-------+-------+---+-----------+
         |5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|
         +--1.index--+--2.index--+--3.index--+--4.index--+
        
         +--first octet--+-second octet--+--third octet--+
         |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|
         +-----------+---+-------+-------+---+-----------+
         |5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|
         +--1.index--+--2.index--+--3.index--+--4.index--+
        

The case for base 32 is shown in the following figure, borrowed from [6]. Each successive character in a base-32 value represents 5 successive bits of the underlying octet sequence. Thus, each group of 8 characters represents a sequence of 5 octets (40 bits).

下图显示了base 32的情况,借用自[6]。base-32值中的每个连续字符表示基础八位字节序列的5个连续位。因此,每组8个字符代表5个八位字节(40位)的序列。

                        1          2          3
          01234567 89012345 67890123 45678901 23456789
         +--------+--------+--------+--------+--------+
         |< 1 >< 2| >< 3 ><|.4 >< 5.|>< 6 ><.|7 >< 8 >|
         +--------+--------+--------+--------+--------+
                                                 <===> 8th character
                                           <====> 7th character
                                      <===> 6th character
                                <====> 5th character
                          <====> 4th character
                     <===> 3rd character
               <====> 2nd character
          <===> 1st character
        
                        1          2          3
          01234567 89012345 67890123 45678901 23456789
         +--------+--------+--------+--------+--------+
         |< 1 >< 2| >< 3 ><|.4 >< 5.|>< 6 ><.|7 >< 8 >|
         +--------+--------+--------+--------+--------+
                                                 <===> 8th character
                                           <====> 7th character
                                      <===> 6th character
                                <====> 5th character
                          <====> 4th character
                     <===> 3rd character
               <====> 2nd character
          <===> 1st character
        

The following example of Base64 data is from [4].

下面的Base64数据示例来自[4]。

Input data: 0x14fb9c03d97e Hex: 1 4 f b 9 c | 0 3 d 9 7 e 8-bit: 00010100 11111011 10011100 | 00000011 11011001 11111110 6-bit: 000101 001111 101110 011100 | 000000 111101 100111 111110 Decimal: 5 15 46 28 0 61 37 62 Output: F P u c A 9 l +

输入数据:0x14fb9c03d97e十六进制:1 4 f b 9 c | 0 3 d 9 7 e 8位:0000100 11111 011 10011100 | 000000 11 11011001 11111111 0 6位:000101111 101110 011100 | 000000 111101 100111 11111 0十进制:5 15 46 28 0 61 37 62输出:f P u c 9 l+

Input data: 0x14fb9c03d9 Hex: 1 4 f b 9 c | 0 3 d 9 8-bit: 00010100 11111011 10011100 | 00000011 11011001 pad with 00 6-bit: 000101 001111 101110 011100 | 000000 111101 100100 Decimal: 5 15 46 28 0 61 36 pad with = Output: F P u c A 9 k =

输入数据:0x14fb9c03d9十六进制:1 4 f b 9 c | 0 3 d 9 8位:0000100 11111 011 10011100 | 000000 11 11011001焊盘带00 6位:000101001111 101110 01100 | 000000 111101 100100十进制:5 15 46 28 0 61 36焊盘带=输出:f P u c 9 k=

       Input data:  0x14fb9c03
       Hex:     1   4    f   b    9   c     | 0   3
       8-bit:   00010100 11111011 10011100  | 00000011
                                              pad with 0000
       6-bit:   000101 001111 101110 011100 | 000000 110000
       Decimal: 5      15     46     28       0      48
                                                   pad with =      =
       Output:  F      P      u      c        A      w      =      =
        
       Input data:  0x14fb9c03
       Hex:     1   4    f   b    9   c     | 0   3
       8-bit:   00010100 11111011 10011100  | 00000011
                                              pad with 0000
       6-bit:   000101 001111 101110 011100 | 000000 110000
       Decimal: 5      15     46     28       0      48
                                                   pad with =      =
       Output:  F      P      u      c        A      w      =      =
        
8. Security Considerations
8. 安全考虑

When implementing Base encoding and decoding, care should be taken not to introduce vulnerabilities to buffer overflow attacks, or other attacks on the implementation. A decoder should not break on invalid input including, e.g., embedded NUL characters (ASCII 0).

在实现基本编码和解码时,应注意不要在实现中引入缓冲区溢出攻击或其他攻击的漏洞。解码器不应在无效输入时中断,例如,包括嵌入的NUL字符(ASCII 0)。

If non-alphabet characters are ignored, instead of causing rejection of the entire encoding (as recommended), a covert channel that can be used to "leak" information is made possible. The implications of this should be understood in applications that do not follow the recommended practice. Similarly, when the base 16 and base 32 alphabets are handled case insensitively, alteration of case can be used to leak information.

如果忽略非字母字符,而不是导致整个编码被拒绝(如建议的那样),则可以使用可用于“泄漏”信息的隐蔽通道。在不遵循推荐做法的应用程序中,应理解这一点的含义。类似地,当以不区分大小写的方式处理基16和基32字母时,可以使用大小写的更改来泄漏信息。

Base encoding visually hides otherwise easily recognized information, such as passwords, but does not provide any computational confidentiality. This has been known to cause security incidents when, e.g., a user reports details of a network protocol exchange

基本编码直观地隐藏了其他容易识别的信息,如密码,但不提供任何计算机密性。众所周知,当用户报告网络协议交换的详细信息时,这会导致安全事件

(perhaps to illustrate some other problem) and accidentally reveals the password because she is unaware that the base encoding does not protect the password.

(也许是为了说明其他一些问题)并意外地泄露了密码,因为她不知道基本编码不能保护密码。

9. References
9. 工具书类
9.1. Normative References
9.1. 规范性引用文件

[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[1] Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。

9.2. Informative References
9.2. 资料性引用

[2] Linn, J., "Privacy Enhancement for Internet Electronic Mail: Part I: Message Encryption and Authentication Procedures", RFC 1421, February 1993.

[2] 林恩,J.,“因特网电子邮件的隐私增强:第一部分:信息加密和认证程序”,RFC 14211993年2月。

[3] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996.

[3] Freed,N.和N.Borenstein,“多用途互联网邮件扩展(MIME)第一部分:互联网邮件正文格式”,RFC 20451996年11月。

[4] Callas, J., Donnerhacke, L., Finney, H. and R. Thayer, "OpenPGP Message Format", RFC 2440, November 1998.

[4] Callas,J.,Donnerhacke,L.,Finney,H.和R.Thayer,“OpenPGP消息格式”,RFC2440,1998年11月。

[5] Eastlake, D., "Domain Name System Security Extensions", RFC 2535, March 1999.

[5] Eastlake,D.,“域名系统安全扩展”,RFC 25351999年3月。

[6] Klyne, G. and L. Masinter, "Identifying Composite Media Features", RFC 2938, September 2000.

[6] Klyne,G.和L.Masinter,“识别复合媒体特征”,RFC 2938,2000年9月。

[7] Myers, J., "SASL GSSAPI mechanisms", Work in Progress.

[7] 迈尔斯,J.,“SASL GSSAPI机制”,正在进行中。

[8] Wilcox-O'Hearn, B., "Post to P2P-hackers mailing list", World Wide Web http://zgp.org/pipermail/p2p-hackers/2001- September/000315.html, September 2001.

[8] Wilcox-O'Hearn,B.,“发布到P2P黑客邮件列表”,万维网http://zgp.org/pipermail/p2p-hackers/2001- 2001年9月/000315.html。

[9] Cerf, V., "ASCII format for Network Interchange", RFC 20, October 1969.

[9] Cerf,V.,“网络交换的ASCII格式”,RFC 20,1969年10月。

10. Acknowledgements
10. 致谢

Several people offered comments and suggestions, including Tony Hansen, Gordon Mohr, John Myers, Chris Newman, and Andrew Sieber. Text used in this document is based on earlier RFCs describing specific uses of various base encodings. The author acknowledges the RSA Laboratories for supporting the work that led to this document.

一些人提出了意见和建议,包括托尼·汉森、戈登·莫尔、约翰·迈尔斯、克里斯·纽曼和安德鲁·西伯。本文档中使用的文本基于早期RFC,描述了各种基本编码的具体用途。作者感谢RSA实验室支持本文档的工作。

11. Editor's Address
11. 编辑地址

Simon Josefsson EMail: simon@josefsson.org

Simon Josefsson电子邮件:simon@josefsson.org

12. Full Copyright Statement
12. 完整版权声明

Copyright (C) The Internet Society (2003). All Rights Reserved.

版权所有(C)互联网协会(2003年)。版权所有。

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.

本文件及其译本可复制并提供给他人,对其进行评论或解释或协助其实施的衍生作品可全部或部分编制、复制、出版和分发,不受任何限制,前提是上述版权声明和本段包含在所有此类副本和衍生作品中。但是,不得以任何方式修改本文件本身,例如删除版权通知或对互联网协会或其他互联网组织的引用,除非出于制定互联网标准的需要,在这种情况下,必须遵循互联网标准过程中定义的版权程序,或根据需要将其翻译成英语以外的其他语言。

The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assignees.

上述授予的有限许可是永久性的,互联网协会或其继承人或受让人不会撤销。

This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

本文件和其中包含的信息是按“原样”提供的,互联网协会和互联网工程任务组否认所有明示或暗示的保证,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。

Acknowledgement

确认

Funding for the RFC Editor function is currently provided by the Internet Society.

RFC编辑功能的资金目前由互联网协会提供。