decodeURIComponent vs unescape, what is wrong with unescape?

后端 未结 4 1121
醉话见心
醉话见心 2020-11-29 23:32

In answering another question I became aware that my Javascript/DOM knowledge had become a bit out of date in that I am still using escape/unescape

相关标签:
4条回答
  • 2020-11-30 00:13

    The best Answer is this it's working online on this website http://meyerweb.com/eric/tools/dencoder/

    function decode() {
        var obj = document.getElementById('dencoder');
        var encoded = obj.value;
        obj.value = decodeURIComponent(encoded.replace(/\+/g,  " "));
    }
    
    0 讨论(0)
  • 2020-11-30 00:16

    escape operates only on characters in the range 0 to 255 inclusive (ISO-8859-1, which is effectively unicode code points representable with a single byte). (*)

    encodeURIComponent works for all strings javascript can represent (which is the whole range of unicode's basic multilingual plane, i e unicode code points 0 to 1,114,111 or 0x10FFFF that cover almost any human writing system in current use).

    Both functions produce url safe strings that only use code points 0 to 127 inclusive (US-ASCII), which the latter accomplishes by first encoding the string as UTF-8 and then applying the %XX hex encoding familiar from escape, to any code point that would not be url safe.

    This is incidentally why you can make a two-funcall UTF-8 encoder/decoder in javascript without any loops or garbage generation, by combining these primitives to cancel out all but the UTF-8-processing side effects, as the unescape and decodeURIComponent versions do the same in reverse.

    (*) Foot note: Some modern browsers like Google Chrome have been tweaked to produce %uXXXX for the above-255 range of characters escape wasn't originally defined for, but web server support for decoding that encoding is not as well-implemented as decoding the IETF-standardized UTF-8 based encoding.

    0 讨论(0)
  • 2020-11-30 00:23

    Another "modern" use I've run into is parsing a URI-encoded string that may include invalid UTF8 byte sequences. In certain cases decodeURIComponent can throw an exception. You may need to catch this exception and fall back to using unescape.

    An example would be 'tür' encoded as 't%FCr' which I've seen Firefox produce (when characters are pasted into the address bar after the ?).

    0 讨论(0)
  • 2020-11-30 00:33

    What I want to know is what is wrong with escape/unescape ?

    They're not “wrong” as such, they're just their own special string format which looks a bit like URI-parameter-encoding but actually isn't. In particular:

    • ‘+’ means plus, not space
    • there is a special “%uNNNN” format for encoding Unicode UTF-16 code points, instead of encoding UTF-8 bytes

    So if you use escape() to create URI parameter values you will get the wrong results for strings containing a plus, or any non-ASCII characters.

    escape() could be used as an internal JavaScript-only encoding scheme, for example to escape cookie values. However now that all browsers support encodeURIComponent (which wasn't originally the case), there's no reason to use escape in preference to that.

    There is only one modern use for escape/unescape that I know of, and that's as a quick way to implement a UTF-8 encoder/decoder, by leveraging the UTF-8 processing in URIComponent handling:

    utf8bytes= unescape(encodeURIComponent(unicodecharacters));
    unicodecharacters= decodeURIComponent(escape(utf8bytes));
    
    0 讨论(0)
提交回复
热议问题