Range of valid character for a base64 encoding

前端 未结 4 1029
既然无缘
既然无缘 2020-12-03 00:30

I am interested in the following:
Is there a list of characters that would never occur as part of a base 64 encoded string?
For example *

相关标签:
4条回答
  • 2020-12-03 01:04

    Here is what I could turn up: RFC 4648

    It includes this convenient table:

                      Table 1: The Base 64 Alphabet
    
     Value Encoding  Value Encoding  Value Encoding  Value Encoding
         0 A            17 R            34 i            51 z
         1 B            18 S            35 j            52 0
         2 C            19 T            36 k            53 1
         3 D            20 U            37 l            54 2
         4 E            21 V            38 m            55 3
         5 F            22 W            39 n            56 4
         6 G            23 X            40 o            57 5
         7 H            24 Y            41 p            58 6
         8 I            25 Z            42 q            59 7
         9 J            26 a            43 r            60 8
        10 K            27 b            44 s            61 9
        11 L            28 c            45 t            62 +
        12 M            29 d            46 u            63 /
        13 N            30 e            47 v
        14 O            31 f            48 w         (pad) =
        15 P            32 g            49 x
        16 Q            33 h            50 y
    

    So a regular expression that matches any character that should never appear in Base 64 encodings would be:

    [^A-Za-z0-9+/=]
    

    However, as kapeps answer points out, this is only the recommendation. Specific implementations might choose a different set of 64 characters. (In fact, even the linked RFC contains an alternative table for URL and filename safe encoding, which replaces character 62 and 63 with - and _ respectively). So I guess it really depends on the implementation that created the encoding.

    0 讨论(0)
  • 2020-12-03 01:12

    You are probably safe with the other answers in most situations, but according to the Wikipedia article on Base64 there shouldn't be a definite list you can rely on:

    The particular choice of character set selected for the 64 characters required for the base varies between implementations.

    RFC 4648 mentions other alphabets, such as the "URL and Filename safe" Base 64 Alphabet, where + and / are replaced with - and _.

    There's a table of Base64 variants which use different characters. Keep in mind that there are implementation specific rules about line separators, which you can find in the same table. Some implementations like Mime even allow (and ignore) characters that are not in the alphabet.

    0 讨论(0)
  • 2020-12-03 01:16

    Base64 only contains A–Z, a–z, 0–9, +, / and =. So the list of characters not to be used is: all possible characters minus the ones mentioned above.

    For special purposes . and _ are possible, too.

    0 讨论(0)
  • 2020-12-03 01:20

    https://en.wikipedia.org/wiki/Base64#Design

    MIME's Base64 implementation uses A–Z, a–z, and 0–9 for the first 62 values

    So for the most part you should expect only alphanumeric characters. The example table in this article shows '+' and '-' also; it's unlikely you would see '*'.

    You can use http://www.motobit.com/util/base64-decoder-encoder.asp to convert to Base64 for example, and for '*' this returns "Kg=="

    0 讨论(0)
提交回复
热议问题