问题
I have an Arabic string with English text and punctuations. I need to filter Arabic text and I tried removing punctuations and English words using sting. However, I lost the spacing between Arabic words. Where am I wrong?
import string
exclude = set(string.punctuation)
main_text = "وزارة الداخلية: لا تتوفر لدينا معلومات رسمية عن سعوديين موقوفين في ليبيا http://alriyadh.com/1031499"
main_text = ''.join(ch for ch in main_text if ch not in exclude)
[output after this step="وزارة الداخلية لا تتوفر لدينا معلومات رسمية عن سعوديين موقوفين في ليبيا httpalriyadhcom1031499]"
n = filter(lambda x: x not in string.printable, n)
print n
وزارةالداخليةلاتتوفرلدينامعلوماترسميةعنسعوديينموقوفينفيليبيا
I am able to remove punctuations and english text but I lost the space between words. How can I retain each words?
回答1:
You can save the spaces in your string by using
n = filter(lambda x: True if x==' ' else x not in string.printable , main_text)
or
n = filter(lambda x: x==' ' or x not in string.printable , main_text)
This will check if the character is space, if not then it will check if it is printable.
回答2:
You can stop it from removing any whitespace as follows:
n = filter(lambda x: x in string.whitespace or x not in string.printable, n)
来源:https://stackoverflow.com/questions/29406247/how-to-remove-english-text-from-arabic-string-in-python