So I have this page:
http://hub.iis.sinica.edu.tw/cytoHubba/
Apparently it\'s all kinds of messed up, as it gets decoded properly but when I try to save it in po
A Python unicode
object is a sequence of Unicode codepoints and by definition proper unicode. A python str
string is a sequence of bytes that might be Unicode characters encoded with a certain encoding (UTF-8, Latin-1, Big5,...).
The first question there is if source
is a unicode
object or a str
string.
That source.encode("utf-8")
works just means that you can convert source
to a UTF-8 encoded string, but are you doing it before you pass it to the database function? The database seems to expect it's inputs to be encoded with UTF-8, and complains that the equivalent of source.decode("utf-8")
fails.
If source
is a unicode
object, it should be encoded to UTF-8 before you pass it to the database:
source = u'abc'
call_db(source.encode('utf-8'))
If source
is a str
encoded as something else than Utf-8, you should decode that encoding and then encode the resulting Unicode object to UTF-8:
source = 'abc'
call_db(source.decode('Big5').encode('utf-8'))