NLTK Most common synonym (Wordnet) for each word

后端 未结 3 2082
生来不讨喜
生来不讨喜 2021-02-10 01:42

Is there a way to find the most common synonym of a word with NLTK? I would like to simplify a sentence using the most common synonyms of each word on it.

If a word use

3条回答
  •  名媛妹妹
    2021-02-10 02:19

    Synonyms are tricky, but if you are starting out with a synset from Wordnet and you simply want to choose the most common member in the set, it's pretty straightforward: Just build your own frequency list from a corpus, and look up each member of the synset to pick the maximum.

    The nltk will let you build a frequency table in just a few lines of code. Here's one based on the Brown corpus:

    from nltk.corpus import brown
    freqs = nltk.FreqDist(w.lower() for w in brown.words())
    

    You can then look up the frequency of a word like this:

    >>> print(freqs["valued"]) 
    14
    

    Of course you'll need to do a little more work: I would count words separately for each of the major parts of speech (wordnet provides n, v, a, and r, resp. noun, verb, adjective and adverb), and use these POS-specific frequencies (after adjusting for the different tagset notations) to choose the right substitute.

    >>> freq2 = nltk.ConditionalFreqDist((tag, wrd.lower()) for wrd, tag in 
            brown.tagged_words(tagset="universal"))
    
    >>> print(freq2["ADJ"]["valued"])
    0
    >>> print(freq2["ADJ"]["dear"])
    45
    

提交回复
热议问题