Javascript unescape() vs. Python urllib.unquote()

后端未结

关注

 1  1552

From reading various posts, it seems like JavaScript\'s unescape() is equivalent to Pythons urllib.unquote(), however when I test both I get differ

相关标签:

1条回答

执念已碎

2021-01-06 00:30
%uxxxx is a non standard URL encoding scheme that is not supported by urllib.parse.unquote() (Py 3) / urllib.unquote() (Py 2).

It was only ever part of ECMAScript ECMA-262 3rd edition; the format was rejected by the W3C and was never a part of an RFC.

You could use a regular expression to convert such codepoints:
```
try:
    unichr  # only in Python 2
except NameError:
    unichr = chr  # Python 3

re.sub(r'%u([a-fA-F0-9]{4}|[a-fA-F0-9]{2})', lambda m: unichr(int(m.group(1), 16)), quoted)
```
This decodes both the %uxxxx and the %uxx form ECMAScript 3rd ed can decode.

Demo:
```
>>> import re
>>> quoted = '%u003c%u0062%u0072%u003e'
>>> re.sub(r'%u([a-fA-F0-9]{4}|[a-fA-F0-9]{2})', lambda m: chr(int(m.group(1), 16)), quoted)
'<br>'
>>> altquoted = '%u3c%u0062%u0072%u3e'
>>> re.sub(r'%u([a-fA-F0-9]{4}|[a-fA-F0-9]{2})', lambda m: chr(int(m.group(1), 16)), altquoted)
'<br>'
```
but you should avoid using the encoding altogether if possible.
0 讨论(0)
发布评论:

提交评论
- 加载中...