问题
I have tried with the replace in python. But it wouldn't work.
my_list=[['the',
'production',
'business',
'environmentâ\xa0evaluating',
'the'],
['impact',
'of',
'the',
'environmental',
'influences',
'such'],
['as',
'political',
'economic',
'technological',
'sociodemographicâ\xa0']]
my_list.replace(u'\xa0', ' ')
and
my_list[0].replace(u'\xa0', ' ')
For this got the attribute error. AttributeError: 'list' object has no attribute 'replace'
How to remove this unwanted string from the list my_list?
回答1:
Use unicodedata
library. That way you can save more information from each word.
import unicodedata
final_list = [[unicodedata.normalize("NFKD", word) for word in ls] for ls in my_list]
To also replace â
with a
very_final_list = [[word.encode('ascii', 'ignore') for word in ls] for ls in final_list]
If you want to completely remove â
then you can
very_final_list = [[word.replace('â', '') for word in ls] for ls in final_list]
and to remove b'
in front of every string, decode it back to utf-8
So putting everything together,
import unicodedata
final_list = [[unicodedata.normalize("NFKD", word) for word in ls] for ls in my_list]
very_final_list = [[word.encode('ascii', 'ignore').decode('utf-8') for word in ls] for ls in final_list]
#very_final_list = [[word.replace('â', '') for word in ls] for ls in final_list]
And here is the final result:
[['the', 'production', 'business', 'environmenta evaluating', 'the'], ['impact', 'of', 'the', 'environmental', 'influences', 'such'], ['as', 'political', 'economic', 'technological', 'sociodemographica ']]
If you switch the very_final_list
statements, then this is the output
[['the', 'production', 'business', 'environment evaluating', 'the'], ['impact', 'of', 'the', 'environmental', 'influences', 'such'], ['as', 'political', 'economic', 'technological', 'sociodemographic ']]
回答2:
lst = []
for l in my_list:
lst.append([s.replace(u'\xa0','') for s in l])
Output:
[['the', 'production', 'business', 'environmentâevaluating', 'the'],
['impact', 'of', 'the', 'environmental', 'influences', 'such'],
['as', 'political', 'economic', 'technological', 'sociodemographicâ']]
Emmmm,The another answer,I think it break the structure of my_list
.But it's easy too.Only one line.
回答3:
Updated : List of List Comprehension should make this work for you
[[w.replace("â\xa0", " ") for w in words] for words in my_list]
Output
[['the', 'production', 'business', 'environment evaluating', 'the'],
['impact', 'of', 'the', 'environmental', 'influences', 'such'],
['as', 'political', 'economic', 'technological', 'sociodemographic ']]
来源:https://stackoverflow.com/questions/53222476/how-to-remove-the-%c3%a2-xa0-from-list-of-strings-in-python