What is the best way to remove accents (normalize) in a Python unicode string?

后端 未结 8 1551
感情败类
感情败类 2020-11-21 06:11

I have a Unicode string in Python, and I would like to remove all the accents (diacritics).

I found on the web an elegant way to do this (in Java):

  1. conve
8条回答
  •  隐瞒了意图╮
    2020-11-21 06:57

    In response to @MiniQuark's answer:

    I was trying to read in a csv file that was half-French (containing accents) and also some strings which would eventually become integers and floats. As a test, I created a test.txt file that looked like this:

    Montréal, über, 12.89, Mère, Françoise, noël, 889

    I had to include lines 2 and 3 to get it to work (which I found in a python ticket), as well as incorporate @Jabba's comment:

    import sys 
    reload(sys) 
    sys.setdefaultencoding("utf-8")
    import csv
    import unicodedata
    
    def remove_accents(input_str):
        nkfd_form = unicodedata.normalize('NFKD', unicode(input_str))
        return u"".join([c for c in nkfd_form if not unicodedata.combining(c)])
    
    with open('test.txt') as f:
        read = csv.reader(f)
        for row in read:
            for element in row:
                print remove_accents(element)
    

    The result:

    Montreal
    uber
    12.89
    Mere
    Francoise
    noel
    889
    

    (Note: I am on Mac OS X 10.8.4 and using Python 2.7.3)

提交回复
热议问题