How do convert unicode escape sequences to unicode characters in a python string

前端 未结 3 674
抹茶落季
抹茶落季 2020-11-30 04:06

When I tried to get the content of a tag using \"unicode(head.contents[3])\" i get the output similar to this: \"Christensen Sk\\xf6ld\". I want the escape sequence to be re

相关标签:
3条回答
  • 2020-11-30 04:29

    Assuming Python sees the name as a normal string, you'll first have to decode it to unicode:

    >>> name
    'Christensen Sk\xf6ld'
    >>> unicode(name, 'latin-1')
    u'Christensen Sk\xf6ld'
    

    Another way of achieving this:

    >>> name.decode('latin-1')
    u'Christensen Sk\xf6ld'
    

    Note the "u" in front of the string, signalling it is uncode. If you print this, the accented letter is shown properly:

    >>> print name.decode('latin-1')
    Christensen Sköld
    

    BTW: when necessary, you can use de "encode" method to turn the unicode into e.g. a UTF-8 string:

    >>> name.decode('latin-1').encode('utf-8')
    'Christensen Sk\xc3\xb6ld'
    
    0 讨论(0)
  • 2020-11-30 04:46

    Given a byte string with Unicode escapes b"\N{SNOWMAN}", b"\N{SNOWMAN}".decode('unicode-escape) will produce the expected Unicode string u'\u2603'.

    0 讨论(0)
  • 2020-11-30 04:48

    I suspect that it's acutally working correctly. By default, Python displays strings in ASCII encoding, since not all terminals support unicode. If you actually print the string, though, it should work. See the following example:

    >>> u'\xcfa'
    u'\xcfa'
    >>> print u'\xcfa'
    Ïa
    
    0 讨论(0)
提交回复
热议问题