问题
I have a couple of text files with characters which has diacritical marks, for example è
, á
, ô
and so on. I'd like to replace these characters with e
, a
, o
, etc
How can I achieve this in Python? Grateful for help!
回答1:
Try unidecode (you may need to install it).
>>> from unidecode import unidecode
>>> s = u"é"
>>> unidecode(s)
'e'
回答2:
Example of what you could do:
accented_string = u'Málaga'
`enter code here`# accented_string is of type 'unicode'
import unidecode
unaccented_string = unidecode.unidecode(accented_string)
# unaccented_string contains 'Malaga'and is of type 'str'
A very similar example of your problem. Check this: What is the best way to remove accents in a Python unicode string?
回答3:
In Python 3, you simply need to use the unidecode
package. It works with both lowercase and uppercase letters.
Installing the package: (you may need to use pip3
instead of pip
depending on your system and setup)
$ pip install unidecode
Then using it as follows:
from unidecode import unidecode
text = ["ÉPÍU", "Naïve Café", "EL NIÑO"]
text1 = [unidecode(s) for s in text]
print(text1)
# ['EPIU', 'Naive Cafe', 'EL NINO']
text2 = [unidecode(s.lower()) for s in text]
print(text2)
# ['epiu', 'naive cafe', 'el nino']
来源:https://stackoverflow.com/questions/48445459/removing-diacritical-marks-using-python