What is the best way to remove accents (normalize) in a Python unicode string?

后端 未结 8 1561
感情败类
感情败类 2020-11-21 06:11

I have a Unicode string in Python, and I would like to remove all the accents (diacritics).

I found on the web an elegant way to do this (in Java):

  1. conve
8条回答
  •  野的像风
    2020-11-21 06:49

    Some languages have combining diacritics as language letters and accent diacritics to specify accent.

    I think it is more safe to specify explicitly what diactrics you want to strip:

    def strip_accents(string, accents=('COMBINING ACUTE ACCENT', 'COMBINING GRAVE ACCENT', 'COMBINING TILDE')):
        accents = set(map(unicodedata.lookup, accents))
        chars = [c for c in unicodedata.normalize('NFD', string) if c not in accents]
        return unicodedata.normalize('NFC', ''.join(chars))
    

提交回复
热议问题