How to remove the â\xa0 from list of strings in python

喜欢而已 提交于 2021-02-11 17:46:01

问题


I have tried with the replace in python. But it wouldn't work.

my_list=[['the',
 'production',
 'business',
 'environmentâ\xa0evaluating',
 'the'],
 ['impact',
 'of',
 'the',
 'environmental',
 'influences',
 'such'],
 ['as',
 'political',
 'economic',
 'technological',
 'sociodemographicâ\xa0']]

my_list.replace(u'\xa0', ' ') and

my_list[0].replace(u'\xa0', ' ')  

For this got the attribute error. AttributeError: 'list' object has no attribute 'replace' How to remove this unwanted string from the list my_list?


回答1:


Use unicodedata library. That way you can save more information from each word.

import unicodedata
final_list = [[unicodedata.normalize("NFKD", word) for word in ls] for ls in my_list]

To also replace with a

very_final_list = [[word.encode('ascii', 'ignore') for word in ls] for ls in final_list]

If you want to completely remove then you can

very_final_list = [[word.replace('â', '') for word in ls] for ls in final_list]

and to remove b' in front of every string, decode it back to utf-8

So putting everything together,

import unicodedata
final_list = [[unicodedata.normalize("NFKD", word) for word in ls] for ls in my_list]
very_final_list = [[word.encode('ascii', 'ignore').decode('utf-8') for word in ls] for ls in final_list]
#very_final_list = [[word.replace('â', '') for word in ls] for ls in final_list]

And here is the final result:

[['the', 'production', 'business', 'environmenta evaluating', 'the'], ['impact', 'of', 'the', 'environmental', 'influences', 'such'], ['as', 'political', 'economic', 'technological', 'sociodemographica ']]

If you switch the very_final_list statements, then this is the output

[['the', 'production', 'business', 'environment evaluating', 'the'], ['impact', 'of', 'the', 'environmental', 'influences', 'such'], ['as', 'political', 'economic', 'technological', 'sociodemographic ']]



回答2:


lst = []
for l in my_list:
    lst.append([s.replace(u'\xa0','') for s in l])

Output:

[['the', 'production', 'business', 'environmentâevaluating', 'the'],
 ['impact', 'of', 'the', 'environmental', 'influences', 'such'],
 ['as', 'political', 'economic', 'technological', 'sociodemographicâ']]

Emmmm,The another answer,I think it break the structure of my_list.But it's easy too.Only one line.




回答3:


Updated : List of List Comprehension should make this work for you

[[w.replace("â\xa0", " ") for w in words] for words in my_list]

Output

[['the', 'production', 'business', 'environment evaluating', 'the'],
['impact', 'of', 'the', 'environmental', 'influences', 'such'],
['as', 'political', 'economic', 'technological', 'sociodemographic ']]


来源:https://stackoverflow.com/questions/53222476/how-to-remove-the-%c3%a2-xa0-from-list-of-strings-in-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!