发表新帖

发表新帖

How can I check a Python unicode string to see that it actually is proper Unicode?

前端未结

关注

 5  866

一个人的身影 2021-02-06 08:56

So I have this page:

http://hub.iis.sinica.edu.tw/cytoHubba/

Apparently it\'s all kinds of messed up, as it gets decoded properly but when I try to save it in po

5条回答

滥情空心 (楼主)

2021-02-06 09:05
A Python unicode object is a sequence of Unicode codepoints and by definition proper unicode. A python str string is a sequence of bytes that might be Unicode characters encoded with a certain encoding (UTF-8, Latin-1, Big5,...).

The first question there is if source is a unicode object or a str string. That source.encode("utf-8") works just means that you can convert source to a UTF-8 encoded string, but are you doing it before you pass it to the database function? The database seems to expect it's inputs to be encoded with UTF-8, and complains that the equivalent of source.decode("utf-8") fails.

If source is a unicode object, it should be encoded to UTF-8 before you pass it to the database:
```
source = u'abc'
call_db(source.encode('utf-8'))
```
If source is a str encoded as something else than Utf-8, you should decode that encoding and then encode the resulting Unicode object to UTF-8:
```
source = 'abc'
call_db(source.decode('Big5').encode('utf-8'))
```
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题