What is the best way to remove accents (normalize) in a Python unicode string?

后端 未结 8 1559
感情败类
感情败类 2020-11-21 06:11

I have a Unicode string in Python, and I would like to remove all the accents (diacritics).

I found on the web an elegant way to do this (in Java):

  1. conve
8条回答
  •  南方客
    南方客 (楼主)
    2020-11-21 06:43

    gensim.utils.deaccent(text) from Gensim - topic modelling for humans:

    'Sef chomutovskych komunistu dostal postou bily prasek'
    

    Another solution is unidecode.

    Note that the suggested solution with unicodedata typically removes accents only in some character (e.g. it turns 'ł' into '', rather than into 'l').

提交回复
热议问题