How can I get python ''.encode('unicode_escape') to return escape codes for ascii?

孤街醉人 提交于 2021-02-08 07:27:07

问题


I am trying to use the encode method of python strings to return the unicode escape codes for characters, like this:

>>> print( 'ф'.encode('unicode_escape').decode('utf8') )
\u0444

This works fine with non-ascii characters, but for ascii characters, it just returns the ascii characters themselves:

>>> print( 'f'.encode('unicode_escape').decode('utf8') )
f

The desired output would be \u0066. This script is for pedagogical purposes.

How can I get the unicode hex codes for ALL characters?


回答1:


Your request is kind of weird. In Python, usually we would just use ord for that instead. There is not need for encoding/decoding here.

>>> '"\\U{:08x}"'.format(ord('f'))  # ...or \u{:04x} if you prefer
'"\\U00000066"'
>>> eval(_)
'f'



回答2:


You'd have to do so manually; if you assume that all your input is within the Unicode BMP, then a straightforward regex will probably be fastest; this replaces every character with their \uhhhh escape:

import re

def unicode_escaped(s, _pattern=re.compile(r'[\x00-\uffff]')):
    return _pattern.sub(lambda m: '\\u{:04x}'.format(
        ord(m.group(0))), s)

I've explicitly limited the pattern to the BMP to gracefully handle non-BMP points.

Demo:

>>> print(unicode_escaped('foo bar ф'))
\u0066\u006f\u006f\u0020\u0062\u0061\u0072\u0020\u0444


来源:https://stackoverflow.com/questions/42077422/how-can-i-get-python-encodeunicode-escape-to-return-escape-codes-for-asci

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!