ValueError: unichr() arg not in range(0x10000) (narrow Python build)

后端 未结 3 1111
星月不相逢
星月不相逢 2020-12-05 19:07

I am trying to convert the html entity to unichar, the html entity is 󮠖 when i try to do the following:

unichr(int(976918))
<         


        
相关标签:
3条回答
  • 2020-12-05 19:46

    Here's an alternate workaround that I developed with the struct module.

    def unichar(i):
        try:
            return unichr(i)
        except ValueError:
            return struct.pack('i', i).decode('utf-32')
    
    >>> unichar(int('976918'))
    u'\U000ee816'
    
    0 讨论(0)
  • 2020-12-05 19:51

    In order for this to work, you either need to build Python yourself, specifying

    ./configure --enable-unicode=ucs4
    

    before compiling, or else you need to move to Python 3.

    Even if you do this, there are apparently problems on Windows, which will be fixed in the next version of Python (3.3).

    0 讨论(0)
  • 2020-12-05 19:55

    You can decode a string that has a Unicode escape (\U followed by 8 hex digits, zero-padded) using the "unicode-escape" encoding:

    >>> s = "\\U%08x" % 976918
    >>> s
    '\\U000ee816'
    
    >>> c = s.decode('unicode-escape')
    >>> c
    u'\U000ee816'
    

    On a narrow build it's stored as a UTF-16 surrogate pair:

    >>> list(c)
    [u'\udb7a', u'\udc16']
    

    This surrogate pair is processed correctly as a code unit during encoding:

    >>> c.encode('utf-8')
    '\xf3\xae\xa0\x96'
    
    >>> '\xf3\xae\xa0\x96'.decode('utf-8')
    u'\U000ee816'
    
    0 讨论(0)
提交回复
热议问题