How can I check a Python unicode string to see that it *actually* is proper Unicode?

前端 未结 5 857
一个人的身影
一个人的身影 2021-02-06 08:56

So I have this page:

http://hub.iis.sinica.edu.tw/cytoHubba/

Apparently it\'s all kinds of messed up, as it gets decoded properly but when I try to save it in po

5条回答
  •  迷失自我
    2021-02-06 09:13

    What exactly are you doing? The content does indeed decode fine as utf-8:

    >>> import urllib
    >>> webcontent = urllib.urlopen("http://hub.iis.sinica.edu.tw/cytoHubba/").read()
    >>> unicodecontent = webcontent.decode("utf-8")
    >>> type(webcontent)
    
    >>> type(unicodecontent)
    
    >>> type(unicodecontent.encode("utf-8"))
    
    

    Make sure you understand the difference between Unicode strings and utf-8 encoded strings, though. What you need to send to the database is unicodecontent.encode("utf-8") (which is the same as webcontent, but you decoded to verify that you don't have invalid byte sequences in your source).

    I'd indeed as WoLpH says check the settings on the database and the database connection.

提交回复
热议问题