问题
Is it necessary for the word input to WordNet to be formatted like "التُّفَّاحْ" and can't expect "التفاح"... is there any library or service taking not-formatted Arabic word returning a list of all its possible synonyms.
回答1:
From التُّفَّاحْ
to التفاح
, you simply want to remove the diacritics then you need a lexical normalization tool. Try Tashaphyne, download and install then use the normalize
module http://pythonhosted.org/Tashaphyne/Tashaphyne.normalize-module.html :
from Tashaphyne import *
text = 'التُّفَّاحْ'
print normalize_hamza(text)
print normalize_lamalef(text)
print normalize_searchtext(text)
来源:https://stackoverflow.com/questions/16096559/arabic-wordnet-with-not-formatted-words