How to fix broken utf-8 encoding in Python?

前端 未结 3 1853
感动是毒
感动是毒 2021-02-08 13:48

My string is Niệm Bồ Tát (Thiá»n sÆ° Nhất Hạnh) and I want to decode it to Niệm Bồ Tát (Thiền sư Nhất Hạnh). I see in that site can do that ht

3条回答
  •  生来不讨喜
    2021-02-08 14:44

    The only thing that helped me with broken cyrillic string - https://github.com/LuminosoInsight/python-ftfy

    This module fixes pretty much everything and works much better than online decoders.

    >>> from ftfy import fix_encoding
    >>> mystr = '09. Bát Nhã Tâm Kinh'
    >>> fix_encoding(mystr)
    '09. Bát Nhã Tâm Kinh'
    

    It can be easily installed using pip install ftfy

提交回复
热议问题