Javascript unescape() vs. Python urllib.unquote()

后端 未结 1 1552
无人共我
无人共我 2021-01-06 00:07

From reading various posts, it seems like JavaScript\'s unescape() is equivalent to Pythons urllib.unquote(), however when I test both I get differ

相关标签:
1条回答
  • 2021-01-06 00:30

    %uxxxx is a non standard URL encoding scheme that is not supported by urllib.parse.unquote() (Py 3) / urllib.unquote() (Py 2).

    It was only ever part of ECMAScript ECMA-262 3rd edition; the format was rejected by the W3C and was never a part of an RFC.

    You could use a regular expression to convert such codepoints:

    try:
        unichr  # only in Python 2
    except NameError:
        unichr = chr  # Python 3
    
    re.sub(r'%u([a-fA-F0-9]{4}|[a-fA-F0-9]{2})', lambda m: unichr(int(m.group(1), 16)), quoted)
    

    This decodes both the %uxxxx and the %uxx form ECMAScript 3rd ed can decode.

    Demo:

    >>> import re
    >>> quoted = '%u003c%u0062%u0072%u003e'
    >>> re.sub(r'%u([a-fA-F0-9]{4}|[a-fA-F0-9]{2})', lambda m: chr(int(m.group(1), 16)), quoted)
    '<br>'
    >>> altquoted = '%u3c%u0062%u0072%u3e'
    >>> re.sub(r'%u([a-fA-F0-9]{4}|[a-fA-F0-9]{2})', lambda m: chr(int(m.group(1), 16)), altquoted)
    '<br>'
    

    but you should avoid using the encoding altogether if possible.

    0 讨论(0)
提交回复
热议问题