发表新帖

发表新帖

What is the best way to remove accents (normalize) in a Python unicode string?

后端未结

关注

 8  1599

感情败类 2020-11-21 06:11

I have a Unicode string in Python, and I would like to remove all the accents (diacritics).

I found on the web an elegant way to do this (in Java):

conve

8条回答

隐瞒了意图╮ (楼主)

2020-11-21 06:57
In response to @MiniQuark's answer:

I was trying to read in a csv file that was half-French (containing accents) and also some strings which would eventually become integers and floats. As a test, I created a test.txt file that looked like this:

Montréal, über, 12.89, Mère, Françoise, noël, 889

I had to include lines 2 and 3 to get it to work (which I found in a python ticket), as well as incorporate @Jabba's comment:
```
import sys 
reload(sys) 
sys.setdefaultencoding("utf-8")
import csv
import unicodedata

def remove_accents(input_str):
    nkfd_form = unicodedata.normalize('NFKD', unicode(input_str))
    return u"".join([c for c in nkfd_form if not unicodedata.combining(c)])

with open('test.txt') as f:
    read = csv.reader(f)
    for row in read:
        for element in row:
            print remove_accents(element)
```
The result:
```
Montreal
uber
12.89
Mere
Francoise
noel
889
```
(Note: I am on Mac OS X 10.8.4 and using Python 2.7.3)
0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...

热议问题