发表新帖

发表新帖

How can I check a Python unicode string to see that it actually is proper Unicode?

前端未结

关注

 5  857

一个人的身影 2021-02-06 08:56

So I have this page:

http://hub.iis.sinica.edu.tw/cytoHubba/

Apparently it\'s all kinds of messed up, as it gets decoded properly but when I try to save it in po

5条回答

迷失自我 (楼主)

2021-02-06 09:13
What exactly are you doing? The content does indeed decode fine as utf-8:
```
>>> import urllib
>>> webcontent = urllib.urlopen("http://hub.iis.sinica.edu.tw/cytoHubba/").read()
>>> unicodecontent = webcontent.decode("utf-8")
>>> type(webcontent)

>>> type(unicodecontent)

>>> type(unicodecontent.encode("utf-8"))
```
Make sure you understand the difference between Unicode strings and utf-8 encoded strings, though. What you need to send to the database is unicodecontent.encode("utf-8") (which is the same as webcontent, but you decoded to verify that you don't have invalid byte sequences in your source).

I'd indeed as WoLpH says check the settings on the database and the database connection.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题