Converting text containing COMBINING DIAERESIS to utf-8

后端 未结 1 394
长情又很酷
长情又很酷 2021-01-13 15:43

We hav some text containing german umlauts represented using e.g. \'a\' + COMBINING DIAERESIS ($cc $88).

Any idea how to convert such text properly to utf8?

相关标签:
1条回答
  • 2021-01-13 16:27

    First, if it's not already a unicode then decode it. Second, unicodedata.normalize(). Third, encode.

    0 讨论(0)
提交回复
热议问题