I have a Unicode string in Python, and I would like to remove all the accents (diacritics).
I found on the web an elegant way to do this (in Java):
gensim.utils.deaccent(text) from Gensim - topic modelling for humans:
'Sef chomutovskych komunistu dostal postou bily prasek'
Another solution is unidecode.
Note that the suggested solution with unicodedata typically removes accents only in some character (e.g. it turns 'ł'
into ''
, rather than into 'l'
).