Escaping unicode strings in python

前端 未结 4 1432

In python these three commands print the same emoji:

print \"\\xF0\\x9F\\x8C\\x80\"
         


        
4条回答
  •  清酒与你
    2021-01-03 04:56

    See Unicode Literals in Python Source Code

    In Python source code, Unicode literals are written as strings prefixed with the ‘u’ or ‘U’ character: u'abcdefghijk'. Specific code points can be written using the \u escape sequence, which is followed by four hex digits giving the code point. The \U escape sequence is similar, but expects 8 hex digits, not 4.

    In [1]: "\xF0\x9F\x8C\x80".decode('utf-8')
    Out[1]: u'\U0001f300'
    
    In [2]: u'\U0001F300'.encode('utf-8')
    Out[2]: '\xf0\x9f\x8c\x80'
    
    In [3]: u'\ud83c\udf00'.encode('utf-8')
    Out[3]: '\xf0\x9f\x8c\x80'
    

    \uhhhh     --> Unicode character with 16-bit hex value  
    \Uhhhhhhhh --> Unicode character with 32-bit hex value
    

    In Unicode escapes, the first form gives four hex digits to encode a 2-byte (16-bit) character code point, and the second gives eight hex digits for a 4-byte (32-bit) code point. Byte strings support only hex escapes for encoded text and other forms of byte-based data

提交回复
热议问题