I am noticing that whenever I base64 encode a string, a \"=\" is appended at the end. Can I remove this character and then reliably decode it later by adding it back, or is
If you're encoding bytes (at fixed bit length), then the padding is redundant. This is the case for most people.
Base64 consumes 6 bits at a time and produces a byte of 8 bits that only uses six bits worth of combinations.
If your string is 1 byte (8 bits), you'll have an output of 12 bits as the smallest multiple of 6 that 8 will fit into, with 4 bits extra. If your string is 2 bytes, you have to output 18 bits, with two bits extra. For multiples of six against multiple of 8 you can have a remainder of either 0, 2 or 4 bits.
The padding says to ignore those extra four (==) or two (=) bits. The padding is there tell the decoder about your padding.
The padding isn't really needed when you're encoding bytes. A base64 encoder can simply ignore left over bits that total less than 8 bits. In this case, you're best off removing it.
The padding might be of some use for streaming and arbitrary length bit sequences as long as they're a multiple of two. It might also be used for cases where people want to only send the last 4 bits when more bits are remaining if the remaining bits are all zero. Some people might want to use it to detect incomplete sequences though it's hardly reliable for that. I've never seen this optimisation in practice. People rarely have these situations, most people use base64 for discrete byte sequences.
If you see answers suggesting to leave it on, that's not a good encouragement if you're simply encoding bytes, it's enabling a feature for a set of circumstances you don't have. The only reason to have it on in that case might be to add tolerance to decoders that don't work without the padding. If you control both ends, that's a non-concern.
If you're using PHP the following function will revert the stripped string to its original format with proper padding:
<?php
$str = 'base64 encoded string without equal signs stripped';
$str = str_pad($str, strlen($str) + (4 - ((strlen($str) % 4) ?: 4)), '=');
echo $str, "\n";
Using Python you can remove base64 padding and add it back like this:
from math import ceil
stripped = original.rstrip('=')
original = stripped.ljust(ceil(len(stripped) / 4) * 4, '=')
The =
is padding.
Wikipedia says
An additional pad character is allocated which may be used to force the encoded output into an integer multiple of 4 characters (or equivalently when the unencoded binary text is not a multiple of 3 bytes) ; these padding characters must then be discarded when decoding but still allow the calculation of the effective length of the unencoded text, when its input binary length would not be a multiple of 3 bytes (the last non-pad character is normally encoded so that the last 6-bit block it represents will be zero-padded on its least significant bits, at most two pad characters may occur at the end of the encoded stream).
If you control the other end, you could remove it when in transport, then re-insert it (by checking the string length) before decoding.
Note that the data will not be valid Base64 in transport.
I wrote part of Apache's commons-codec-1.4.jar Base64 decoder, and in that logic we are fine without padding characters. End-of-file and End-of-stream are just as good indicators that the Base64 message is finished as any number of '=' characters!
The URL-Safe variant we introduced in commons-codec-1.4 omits the padding characters on purpose to keep things smaller!
http://commons.apache.org/codec/apidocs/src-html/org/apache/commons/codec/binary/Base64.html#line.478
I guess a safer answer is, "depends on your decoder implementation," but logically it is not hard to write a decoder that doesn't need padding.
In JavaScript you could do something like this:
// if this is your Base64 encoded string
var str = 'VGhpcyBpcyBhbiBhd2Vzb21lIHNjcmlwdA==';
// make URL friendly:
str = str.replace(/\+/g, '-').replace(/\//g, '_').replace(/\=+$/, '');
// reverse to original encoding
if (str.length % 4 != 0){
str += ('===').slice(0, 4 - (str.length % 4));
}
str = str.replace(/-/g, '+').replace(/_/g, '/');
See also this Fiddle: http://jsfiddle.net/7bjaT/66/