I have a Unicode string in Python, and I would like to remove all the accents (diacritics).
I found on the web an elegant way to do this (in Java):
Some languages have combining diacritics as language letters and accent diacritics to specify accent.
I think it is more safe to specify explicitly what diactrics you want to strip:
def strip_accents(string, accents=('COMBINING ACUTE ACCENT', 'COMBINING GRAVE ACCENT', 'COMBINING TILDE')):
accents = set(map(unicodedata.lookup, accents))
chars = [c for c in unicodedata.normalize('NFD', string) if c not in accents]
return unicodedata.normalize('NFC', ''.join(chars))