问题
I am trying to use the encode
method of python strings to return the unicode escape codes for characters, like this:
>>> print( 'ф'.encode('unicode_escape').decode('utf8') )
\u0444
This works fine with non-ascii characters, but for ascii characters, it just returns the ascii characters themselves:
>>> print( 'f'.encode('unicode_escape').decode('utf8') )
f
The desired output would be \u0066
. This script is for pedagogical purposes.
How can I get the unicode hex codes for ALL characters?
回答1:
Your request is kind of weird. In Python, usually we would just use ord
for that instead. There is not need for encoding/decoding here.
>>> '"\\U{:08x}"'.format(ord('f')) # ...or \u{:04x} if you prefer
'"\\U00000066"'
>>> eval(_)
'f'
回答2:
You'd have to do so manually; if you assume that all your input is within the Unicode BMP, then a straightforward regex will probably be fastest; this replaces every character with their \uhhhh
escape:
import re
def unicode_escaped(s, _pattern=re.compile(r'[\x00-\uffff]')):
return _pattern.sub(lambda m: '\\u{:04x}'.format(
ord(m.group(0))), s)
I've explicitly limited the pattern to the BMP to gracefully handle non-BMP points.
Demo:
>>> print(unicode_escaped('foo bar ф'))
\u0066\u006f\u006f\u0020\u0062\u0061\u0072\u0020\u0444
来源:https://stackoverflow.com/questions/42077422/how-can-i-get-python-encodeunicode-escape-to-return-escape-codes-for-asci