Is it ok to remove the equal signs from a base64 string?

心不动则不痛 提交于 2020-08-21 04:42:16

问题


I have a string that I'm encoding into base64 to conserve space. Is it a big deal if I remove the equal sign at the end? Would this significantly decrease entropy? What can I do to ensure the length of the resulting string is fixed?

>>> base64.b64encode(combined.digest(), altchars="AB")
'PeFC3irNFx8fuzwjAzAfEAup9cz6xujsf2gAIH2GdUM='

Thanks.


回答1:


Looking at your code:

>>> base64.b64encode(combined.digest(), altchars="AB")
'PeFC3irNFx8fuzwjAzAfEAup9cz6xujsf2gAIH2GdUM='

The string that's being encoded in base64 is the result of a function called digest(). If your digest function is producing fixed length values (e.g. if it's calculating MD5 or SHA1 digests), then the parameter to b64encode will always be the same length.

If the above is true, then you can strip of the trailing equals signs, because there will always be the same number of them. If you do that, simply append the same number of equals signs to the string before you decode.

If the digest is not a fixed length, then it's not safe to trim the equals signs.

Edit: Looks like you might be using a SHA-256 digest? The SHA-256 digest is 256 bits (or 32 bytes). 32 bytes is 10 groups of 3, plus two left over. As you'll see from the Wikipedia section on padding; that'd mean you always have one trailing equals. If it is SHA-256, then it'd be OK to strip it, so long as you remember to add it again before decoding.




回答2:


Every 3 bytes you need to encode as Base64 are converted to 4 ASCII characters and the '=' character is used to pad the result so that there are always a multiple of 4 encoded characters. If you have an exact multiple of 3 bytes then you will get no equal sign. One spare byte means you get two '=' characters at the end. Two spare bytes means you get one '=' character at the end. depending on how you decode the string it may or may not see this as a valid string. With the example string you have, it doesn't decode, but some simple strings I've tried do decode.

You can read this page for a better understanding of base64 strings and encoding/decoding.

http://www.nczonline.net/blog/2009/12/08/computer-science-in-javascript-base64-encoding/

There are free online encoder/decoders that you can use to check your output string




回答3:


It's fine to remove the equals signs, as long as you know what they do.

Base64 outputs 4 characters for every 3 bytes it encodes (in other words, each character encodes 6 bits). The padding characters are added so that any base64 string is always a multiple of 4 in length, the padding chars don't actually encode any data. (I can't say for sure why this was done - as a way of error checking if a string was truncated, to ease decoding, or something else?).

In any case, that means if you have x base64 characters (sans padding), there will be 4-(x%4) padding characters. (Though x%4=1 will never happen due the factorization of 6 and 8). Since these contain no actual data, and can be recovered, I frequently strip these off when I want to save space, e.g. the following::

from base64 import b64encode, b64decode

# encode data
raw = b'\x00\x01'
enc = b64encode(raw).rstrip("=")

# func to restore padding
def repad(data):
     return data + "=" * (-len(data)%4)
raw = b64decode(repad(enc))



回答4:


those are padding and you don't save much by removing them as there are at most two of them, so if you want to save space look else where. and by the reference to entropy are you compressing these base64 strings? if so even if you do remove them, they will not have much of an effect on the compressed size.




回答5:


Other than in the case @Martin Ellis points out, messing with the padding characters can lead to getting a

TypeError: Incorrect padding

and And producing some garbage while you're at it.

As stated by @MattH, base64 will do the opposite of conserving space.

Instead to conserve space, you should apply compression algorithms such as zlib.

For example, zlib

import zlib

s = '''large string....'''
compressed = zlib.compress(s)

compression_ratio = len(s)*1.0/len(compressed)    

# And later...
out = zlib.decompress(compressed) 

# The above function is also good for relieving stress.



回答6:


I don't think so.
https://en.wikipedia.org/wiki/Base64#Output_padding

These equals are "useful".



来源:https://stackoverflow.com/questions/9020409/is-it-ok-to-remove-the-equal-signs-from-a-base64-string

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!