Arabic WordNet with not-formatted words

梦想的初衷 提交于 2020-01-05 08:47:42

问题


Is it necessary for the word input to WordNet to be formatted like "التُّفَّاحْ" and can't expect "التفاح"... is there any library or service taking not-formatted Arabic word returning a list of all its possible synonyms.


回答1:


From التُّفَّاحْ to التفاح, you simply want to remove the diacritics then you need a lexical normalization tool. Try Tashaphyne, download and install then use the normalize module http://pythonhosted.org/Tashaphyne/Tashaphyne.normalize-module.html :

from Tashaphyne import *

text = 'التُّفَّاحْ'
print normalize_hamza(text)
print normalize_lamalef(text)
print normalize_searchtext(text)


来源:https://stackoverflow.com/questions/16096559/arabic-wordnet-with-not-formatted-words

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!