How to fix broken utf-8 encoding in Python?

前端 未结 3 1858
感动是毒
感动是毒 2021-02-08 13:48

My string is Niệm Bồ Tát (Thiá»n sÆ° Nhất Hạnh) and I want to decode it to Niệm Bồ Tát (Thiền sư Nhất Hạnh). I see in that site can do that ht

3条回答
  •  长发绾君心
    2021-02-08 14:39

    I'm not sure what you can do with these kind of data, but for your example in your original post, this works:

    >>> mystr = '09. Bát Nhã Tâm Kinh'
    >>> s = mystr.decode('utf8').encode('latin1').decode('utf8')
    >>> s
    u'09. B\xe1t Nh\xe3 T\xe2m Kinh'
    >>> print(s)
    09. Bát Nhã Tâm Kinh
    

提交回复
热议问题