问题
Short version:
If I have a stemmed word:Say 'comput' for 'computing', or 'sugari' for 'sugary'
Is there a way to construct it's closest noun form?That is 'computer', or 'sugar' respectively
Longer version:
I'm using python and NLTK, Wordnet to perform a few semantic similarity tasks on a bunch of words.
I noticed that most sem-sim scores work well only for nouns, while adjectives and verbs don't give any results.
Understanding the inaccuracies involved, I wanted to convert a word from its verb/adjective form to its noun form, so I may get an estimate of their similarity (instead of the 'NONE' that normally gets returned with adjectives).
I thought one way to do this would be to use a stemmer to get at the root word, and then try to construct the closest noun form of that root.
George-Bogdan Ivanov's algorithm from here works pretty well. I wanted to try alternative approaches. Is there any better way to convert a word from adjective/verb form to noun form?
回答1:
You might want to look at this example:
>>> from nltk.stem.wordnet import WordNetLemmatizer
>>> WordNetLemmatizer().lemmatize('having','v')
'have'
(from this SO answer) to see if it sends you in the right direction.
回答2:
First extract all the possible candidates from wordnet
synsets.
Then use difflib
to compare the strings against your target stem.
>>> from nltk.corpus import wordnet as wn
>>> from itertools import chain
>>> from difflib import get_close_matches as gcm
>>> target = "comput"
>>> candidates = set(chain(*[ss.lemma_names for ss in wn.all_synsets('n') if len([i for i in ss.lemma_names if target in i]) > 0]))
>>> gcm(target,candidates)[0]
A more human readable way to compute the candidates is as such:
candidates = set()
for ss in wn.all_synsets('n'):
for ln in ss.lemma_names: # get all possible lemmas for this synset.
for lemma in ln:
if target in lemma:
candidates.add(target)
来源:https://stackoverflow.com/questions/17083442/getting-the-closest-noun-from-a-stemmed-word