NLTK Most common synonym (Wordnet) for each word

后端未结

关注

 3  2089

生来不讨喜 2021-02-10 01:42

Is there a way to find the most common synonym of a word with NLTK? I would like to simplify a sentence using the most common synonyms of each word on it.

If a word use

3条回答

名媛妹妹 (楼主)

2021-02-10 02:19
Synonyms are tricky, but if you are starting out with a synset from Wordnet and you simply want to choose the most common member in the set, it's pretty straightforward: Just build your own frequency list from a corpus, and look up each member of the synset to pick the maximum.

The nltk will let you build a frequency table in just a few lines of code. Here's one based on the Brown corpus:
```
from nltk.corpus import brown
freqs = nltk.FreqDist(w.lower() for w in brown.words())
```
You can then look up the frequency of a word like this:
```
>>> print(freqs["valued"]) 
14
```
Of course you'll need to do a little more work: I would count words separately for each of the major parts of speech (wordnet provides n, v, a, and r, resp. noun, verb, adjective and adverb), and use these POS-specific frequencies (after adjusting for the different tagset notations) to choose the right substitute.
```
>>> freq2 = nltk.ConditionalFreqDist((tag, wrd.lower()) for wrd, tag in 
        brown.tagged_words(tagset="universal"))

>>> print(freq2["ADJ"]["valued"])
0
>>> print(freq2["ADJ"]["dear"])
45
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...