Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings

后端 未结 10 1961
野趣味
野趣味 2020-11-22 16:24

I\'m using the Javascript window.atob() function to decode a base64-encoded string (specifically the base64-encoded content from the GitHub API). Problem is I\'

相关标签:
10条回答
  • 2020-11-22 16:33

    including above solution if still facing issue try as below, Considerign the case where escape is not supported for TS.

    blob = new Blob(["\ufeff", csv_content]); // this will make symbols to appears in excel 
    

    for csv_content you can try like below.

    function b64DecodeUnicode(str: any) {        
            return decodeURIComponent(atob(str).split('').map((c: any) => {
                return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
            }).join(''));
        }
    
    0 讨论(0)
  • 2020-11-22 16:39

    Decoding base64 to UTF8 String

    Below is current most voted answer by @brandonscript

    function b64DecodeUnicode(str) {
        // Going backwards: from bytestream, to percent-encoding, to original string.
        return decodeURIComponent(atob(str).split('').map(function(c) {
            return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
        }).join(''));
    }
    

    Above code can work, but it's very slow. If your input is a very large base64 string, for example 30,000 chars for a base64 html document. It will need lots of computation.

    Here is my answer, use built-in TextDecoder, nearly 10x faster than above code for large input.

    function decodeBase64(base64) {
        const text = atob(base64);
        const length = text.length;
        const bytes = new Uint8Array(length);
        for (let i = 0; i < length; i++) {
            bytes[i] = text.charCodeAt(i);
        }
        const decoder = new TextDecoder(); // default is utf-8
        return decoder.decode(bytes);
    }
    
    0 讨论(0)
  • 2020-11-22 16:40

    Small correction, unescape and escape are deprecated, so:

    function utf8_to_b64( str ) {
        return window.btoa(decodeURIComponent(encodeURIComponent(str)));
    }
    
    function b64_to_utf8( str ) {
         return decodeURIComponent(encodeURIComponent(window.atob(str)));
    }
    
    
    function b64_to_utf8( str ) {
        str = str.replace(/\s/g, '');    
        return decodeURIComponent(encodeURIComponent(window.atob(str)));
    }
    
    0 讨论(0)
  • 2020-11-22 16:42

    There's a great article on Mozilla's MDN docs that describes exactly this issue:

    The "Unicode Problem" Since DOMStrings are 16-bit-encoded strings, in most browsers calling window.btoa on a Unicode string will cause a Character Out Of Range exception if a character exceeds the range of a 8-bit byte (0x00~0xFF). There are two possible methods to solve this problem:

    • the first one is to escape the whole string (with UTF-8, see encodeURIComponent) and then encode it;
    • the second one is to convert the UTF-16 DOMString to an UTF-8 array of characters and then encode it.

    A note on previous solutions: the MDN article originally suggested using unescape and escape to solve the Character Out Of Range exception problem, but they have since been deprecated. Some other answers here have suggested working around this with decodeURIComponent and encodeURIComponent, this has proven to be unreliable and unpredictable. The most recent update to this answer uses modern JavaScript functions to improve speed and modernize code.

    If you're trying to save yourself some time, you could also consider using a library:

    • js-base64 (NPM, great for Node.js)
    • base64-js

    Encoding UTF8 ⇢ base64

    function b64EncodeUnicode(str) {
        // first we use encodeURIComponent to get percent-encoded UTF-8,
        // then we convert the percent encodings into raw bytes which
        // can be fed into btoa.
        return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
            function toSolidBytes(match, p1) {
                return String.fromCharCode('0x' + p1);
        }));
    }
    
    b64EncodeUnicode('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
    b64EncodeUnicode('\n'); // "Cg=="
    

    Decoding base64 ⇢ UTF8

    function b64DecodeUnicode(str) {
        // Going backwards: from bytestream, to percent-encoding, to original string.
        return decodeURIComponent(atob(str).split('').map(function(c) {
            return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
        }).join(''));
    }
    
    b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
    b64DecodeUnicode('Cg=='); // "\n"
    

    The pre-2018 solution (functional, and though likely better support for older browsers, not up to date)

    Here is the the current recommendation, direct from MDN, with some additional TypeScript compatibility via @MA-Maddin:

    // Encoding UTF8 ⇢ base64
    
    function b64EncodeUnicode(str) {
        return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g, function(match, p1) {
            return String.fromCharCode(parseInt(p1, 16))
        }))
    }
    
    b64EncodeUnicode('✓ à la mode') // "4pyTIMOgIGxhIG1vZGU="
    b64EncodeUnicode('\n') // "Cg=="
    
    // Decoding base64 ⇢ UTF8
    
    function b64DecodeUnicode(str) {
        return decodeURIComponent(Array.prototype.map.call(atob(str), function(c) {
            return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2)
        }).join(''))
    }
    
    b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU=') // "✓ à la mode"
    b64DecodeUnicode('Cg==') // "\n"
    

    The original solution (deprecated)

    This used escape and unescape (which are now deprecated, though this still works in all modern browsers):

    function utf8_to_b64( str ) {
        return window.btoa(unescape(encodeURIComponent( str )));
    }
    
    function b64_to_utf8( str ) {
        return decodeURIComponent(escape(window.atob( str )));
    }
    
    // Usage:
    utf8_to_b64('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
    b64_to_utf8('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
    

    And one last thing: I first encountered this problem when calling the GitHub API. To get this to work on (Mobile) Safari properly, I actually had to strip all white space from the base64 source before I could even decode the source. Whether or not this is still relevant in 2017, I don't know:

    function b64_to_utf8( str ) {
        str = str.replace(/\s/g, '');    
        return decodeURIComponent(escape(window.atob( str )));
    }
    
    0 讨论(0)
  • 2020-11-22 16:42

    The complete article that works for me: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Base64_encoding_and_decoding

    The part where we encode from Unicode/UTF-8 is

    function utf8_to_b64( str ) {
       return window.btoa(unescape(encodeURIComponent( str )));
    }
    
    function b64_to_utf8( str ) {
       return decodeURIComponent(escape(window.atob( str )));
    }
    
    // Usage:
    utf8_to_b64('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
    b64_to_utf8('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
    

    This is one of the most used methods nowadays.

    0 讨论(0)
  • 2020-11-22 16:45

    Here's some future-proof code for browsers that may lack escape/unescape(). Note that IE 9 and older don't support atob/btoa(), so you'd need to use custom base64 functions for them.

    // Polyfill for escape/unescape
    if( !window.unescape ){
        window.unescape = function( s ){
            return s.replace( /%([0-9A-F]{2})/g, function( m, p ) {
                return String.fromCharCode( '0x' + p );
            } );
        };
    }
    if( !window.escape ){
        window.escape = function( s ){
            var chr, hex, i = 0, l = s.length, out = '';
            for( ; i < l; i ++ ){
                chr = s.charAt( i );
                if( chr.search( /[A-Za-z0-9\@\*\_\+\-\.\/]/ ) > -1 ){
                    out += chr; continue; }
                hex = s.charCodeAt( i ).toString( 16 );
                out += '%' + ( hex.length % 2 != 0 ? '0' : '' ) + hex;
            }
            return out;
        };
    }
    
    // Base64 encoding of UTF-8 strings
    var utf8ToB64 = function( s ){
        return btoa( unescape( encodeURIComponent( s ) ) );
    };
    var b64ToUtf8 = function( s ){
        return decodeURIComponent( escape( atob( s ) ) );
    };
    

    A more comprehensive example for UTF-8 encoding and decoding can be found here: http://jsfiddle.net/47zwb41o/

    0 讨论(0)
提交回复
热议问题