How can I get python ''.encode('unicode_escape') to return escape codes for ascii?

孤街醉人 提交于 2021-02-08 07:27:07


I am trying to use the encode method of python strings to return the unicode escape codes for characters, like this:

>>> print( 'ф'.encode('unicode_escape').decode('utf8') )

This works fine with non-ascii characters, but for ascii characters, it just returns the ascii characters themselves:

>>> print( 'f'.encode('unicode_escape').decode('utf8') )

The desired output would be \u0066. This script is for pedagogical purposes.

How can I get the unicode hex codes for ALL characters?


Your request is kind of weird. In Python, usually we would just use ord for that instead. There is not need for encoding/decoding here.

>>> '"\\U{:08x}"'.format(ord('f'))  # ...or \u{:04x} if you prefer
>>> eval(_)


You'd have to do so manually; if you assume that all your input is within the Unicode BMP, then a straightforward regex will probably be fastest; this replaces every character with their \uhhhh escape:

import re

def unicode_escaped(s, _pattern=re.compile(r'[\x00-\uffff]')):
    return _pattern.sub(lambda m: '\\u{:04x}'.format(
        ord(, s)

I've explicitly limited the pattern to the BMP to gracefully handle non-BMP points.


>>> print(unicode_escaped('foo bar ф'))

