How to remove english text from arabic string in python?

醉酒当歌 提交于 2021-01-22 08:52:34

问题


I have an Arabic string with English text and punctuations. I need to filter Arabic text and I tried removing punctuations and English words using sting. However, I lost the spacing between Arabic words. Where am I wrong?

import string
exclude = set(string.punctuation)

main_text = "وزارة الداخلية: لا تتوفر لدينا معلومات رسمية عن سعوديين موقوفين في ليبيا http://alriyadh.com/1031499"
main_text = ''.join(ch for ch in main_text if ch not in exclude)
[output after this step="وزارة الداخلية لا تتوفر لدينا معلومات رسمية عن سعوديين موقوفين في ليبيا httpalriyadhcom1031499]"
n = filter(lambda x: x not in string.printable, n)
print n
وزارةالداخليةلاتتوفرلدينامعلوماترسميةعنسعوديينموقوفينفيليبيا

I am able to remove punctuations and english text but I lost the space between words. How can I retain each words?


回答1:


You can save the spaces in your string by using

n = filter(lambda x: True if x==' ' else x not in string.printable , main_text)

or

n = filter(lambda x: x==' ' or x not in string.printable , main_text)

This will check if the character is space, if not then it will check if it is printable.




回答2:


You can stop it from removing any whitespace as follows:

n = filter(lambda x: x in string.whitespace or x not in string.printable, n)


来源:https://stackoverflow.com/questions/29406247/how-to-remove-english-text-from-arabic-string-in-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!