问题
Would like to replace all the french letters within words with their ASCII equivalent.
letters = [['é', 'à'], ['è', 'ù'], ['â', 'ê'], ['î', 'ô'], ['û', 'ç']]
for x in letters:
for a in x:
a = a.replace('é', 'e')
a = a.replace('à', 'a')
a = a.replace('è', 'e')
a = a.replace('ù', 'u')
a = a.replace('â', 'a')
a = a.replace('ê', 'e')
a = a.replace('î', 'i')
a = a.replace('ô', 'o')
a = a.replace('û', 'u')
a = a.replace('ç', 'c')
print letters[0][0]
This code prints é
however. How can I make this work?
回答1:
May I suggest you consider using translation tables.
translationTable = str.maketrans("éàèùâêîôûç", "eaeuaeiouc")
test = "Héllô Càèùverâêt Jîôûç"
test = test.translate(translationTable)
print(test)
will print Hello Caeuveraet Jiouc
. Pardon my French.
回答2:
You can also use unidecode
. Install it : pip install unidecode
.
Then, do:
from unidecode import unidecode
s = "Héllô Càèùverâêt Jîôûç ïîäüë"
s = unidecode(s)
print(s) # Hello Caeuveraet Jiouc iiaue
The result will be the same string, but the french characters will be converted to their ASCII equivalent: Hello Caeuveraet Jiouc iiaue
回答3:
The replace
function returns the string with the character replaced.
In your code you don't store this return value.
The lines in your loop should be a = a.replace('é', 'e')
.
You also need to store that output so you can print it in the end.
e: This post explains how variables within loops are accessed
回答4:
Here's another solution, using the low level unicode package called unicodedata
.
In the unicode structure, a character like 'ô' is actually a composite character, made of the character 'o' and another character called 'COMBINING GRAVE ACCENT', which is basically the '̀'. Using the method decomposition
in unicodedata
, one can obtain the unicodes (in hex) of these two parts.
>>> import unicodedata as ud
>>> ud.decomposition('ù')
'0075 0300'
>>> chr(0x0075)
'u'
>>> >>> chr(0x0300)
'̀'
Therefore, to retrieve 'u' from 'ù', we can first do a string split, then use the built-in int
function for the conversion(see this thread for converting a hex string to an integer), and then get the character using chr
function.
import unicodedata as ud
def get_ascii_char(c):
s = ud.decomposition(c)
if s == '': # for an indecomposable character, it returns ''
return c
code = int('0x' + s.split()[0], 0)
return chr(code)
I'm new to the unicode representation and utilities in python. If anyone has any suggestion to improving this piece of codes, I'll be very happy to learn that!
Cheers!
来源:https://stackoverflow.com/questions/41004941/python-replace-french-letters-with-english