Is it ok to remove the equal signs from a base64 string?

前端 未结 6 1054
眼角桃花
眼角桃花 2021-02-02 10:05

I have a string that I\'m encoding into base64 to conserve space. Is it a big deal if I remove the equal sign at the end? Would this significantly decrease entropy? What can I d

相关标签:
6条回答
  • 2021-02-02 10:33

    Every 3 bytes you need to encode as Base64 are converted to 4 ASCII characters and the '=' character is used to pad the result so that there are always a multiple of 4 encoded characters. If you have an exact multiple of 3 bytes then you will get no equal sign. One spare byte means you get two '=' characters at the end. Two spare bytes means you get one '=' character at the end. depending on how you decode the string it may or may not see this as a valid string. With the example string you have, it doesn't decode, but some simple strings I've tried do decode.

    You can read this page for a better understanding of base64 strings and encoding/decoding.

    http://www.nczonline.net/blog/2009/12/08/computer-science-in-javascript-base64-encoding/

    There are free online encoder/decoders that you can use to check your output string

    0 讨论(0)
  • 2021-02-02 10:41

    Other than in the case @Martin Ellis points out, messing with the padding characters can lead to getting a

    TypeError: Incorrect padding
    

    and And producing some garbage while you're at it.

    As stated by @MattH, base64 will do the opposite of conserving space.

    Instead to conserve space, you should apply compression algorithms such as zlib.

    For example, zlib

    import zlib
    
    s = '''large string....'''
    compressed = zlib.compress(s)
    
    compression_ratio = len(s)*1.0/len(compressed)    
    
    # And later...
    out = zlib.decompress(compressed) 
    
    # The above function is also good for relieving stress.
    
    0 讨论(0)
  • 2021-02-02 10:44

    I don't think so.
    https://en.wikipedia.org/wiki/Base64#Output_padding

    These equals are "useful".

    0 讨论(0)
  • 2021-02-02 10:55

    It's fine to remove the equals signs, as long as you know what they do.

    Base64 outputs 4 characters for every 3 bytes it encodes (in other words, each character encodes 6 bits). The padding characters are added so that any base64 string is always a multiple of 4 in length, the padding chars don't actually encode any data. (I can't say for sure why this was done - as a way of error checking if a string was truncated, to ease decoding, or something else?).

    In any case, that means if you have x base64 characters (sans padding), there will be 4-(x%4) padding characters. (Though x%4=1 will never happen due the factorization of 6 and 8). Since these contain no actual data, and can be recovered, I frequently strip these off when I want to save space, e.g. the following::

    from base64 import b64encode, b64decode
    
    # encode data
    raw = b'\x00\x01'
    enc = b64encode(raw).rstrip("=")
    
    # func to restore padding
    def repad(data):
         return data + "=" * (-len(data)%4)
    raw = b64decode(repad(enc))
    
    0 讨论(0)
  • 2021-02-02 10:58

    Looking at your code:

    >>> base64.b64encode(combined.digest(), altchars="AB")
    'PeFC3irNFx8fuzwjAzAfEAup9cz6xujsf2gAIH2GdUM='
    

    The string that's being encoded in base64 is the result of a function called digest(). If your digest function is producing fixed length values (e.g. if it's calculating MD5 or SHA1 digests), then the parameter to b64encode will always be the same length.

    If the above is true, then you can strip of the trailing equals signs, because there will always be the same number of them. If you do that, simply append the same number of equals signs to the string before you decode.

    If the digest is not a fixed length, then it's not safe to trim the equals signs.

    Edit: Looks like you might be using a SHA-256 digest? The SHA-256 digest is 256 bits (or 32 bytes). 32 bytes is 10 groups of 3, plus two left over. As you'll see from the Wikipedia section on padding; that'd mean you always have one trailing equals. If it is SHA-256, then it'd be OK to strip it, so long as you remember to add it again before decoding.

    0 讨论(0)
  • 2021-02-02 10:59

    those are padding and you don't save much by removing them as there are at most two of them, so if you want to save space look else where. and by the reference to entropy are you compressing these base64 strings? if so even if you do remove them, they will not have much of an effect on the compressed size.

    0 讨论(0)
提交回复
热议问题