Is there a way to find the most common synonym of a word with NLTK? I would like to simplify a sentence using the most common synonyms of each word on it.
If a word use
Synonyms are tricky, but if you are starting out with a synset from Wordnet and you simply want to choose the most common member in the set, it's pretty straightforward: Just build your own frequency list from a corpus, and look up each member of the synset to pick the maximum.
The nltk will let you build a frequency table in just a few lines of code. Here's one based on the Brown corpus:
from nltk.corpus import brown
freqs = nltk.FreqDist(w.lower() for w in brown.words())
You can then look up the frequency of a word like this:
>>> print(freqs["valued"])
14
Of course you'll need to do a little more work: I would count words separately for each of the major parts of speech (wordnet provides n
, v
, a
, and r
, resp. noun
, verb
, adjective
and adverb
), and use these POS-specific frequencies (after adjusting for the different tagset notations) to choose the right substitute.
>>> freq2 = nltk.ConditionalFreqDist((tag, wrd.lower()) for wrd, tag in
brown.tagged_words(tagset="universal"))
>>> print(freq2["ADJ"]["valued"])
0
>>> print(freq2["ADJ"]["dear"])
45