So I have this page:
http://hub.iis.sinica.edu.tw/cytoHubba/
Apparently it\'s all kinds of messed up, as it gets decoded properly but when I try to save it in po
What exactly are you doing? The content does indeed decode fine as utf-8
:
>>> import urllib
>>> webcontent = urllib.urlopen("http://hub.iis.sinica.edu.tw/cytoHubba/").read()
>>> unicodecontent = webcontent.decode("utf-8")
>>> type(webcontent)
>>> type(unicodecontent)
>>> type(unicodecontent.encode("utf-8"))
Make sure you understand the difference between Unicode strings and utf-8 encoded strings, though. What you need to send to the database is unicodecontent.encode("utf-8")
(which is the same as webcontent
, but you decoded to verify that you don't have invalid byte sequences in your source).
I'd indeed as WoLpH says check the settings on the database and the database connection.