NLTK Most common synonym (Wordnet) for each word

后端 未结 3 2085
生来不讨喜
生来不讨喜 2021-02-10 01:42

Is there a way to find the most common synonym of a word with NLTK? I would like to simplify a sentence using the most common synonyms of each word on it.

If a word use

3条回答
  •  忘了有多久
    2021-02-10 02:13

    Synonyms are a huge and open area of work in natural language processing.

    In your example, how is the program supposed to know what the allowed synonyms are? One method might be to keep a dictionary of sets of synonyms for each word. However, this can run into problems due to overlaps in parts of speech: "dear" is an adjective, but "valued" can be an adjective or a past-tense verb.

    Context is also important: the bigram "dear friend" might be more common than "valued friend", but "valued customer" would be more common than "dear customer". So, the sense of a given word needs to be accounted for too.

    Another method might be to just look at everything and see what words appear in similar contexts. You need a huge corpus for this to be effective though, and you have to decide how large a window of n-grams you want to use (a bigram context? A 20-gram context?).

    I recommend you take a look at applications of WordNet (https://wordnet.princeton.edu/), which was designed to help figure some of these things out. Unfortunately, I'm not sure you'll find a way to "solve" synonyms on your own, but keep looking and asking questions!

    Edit: I should have included this link to an older question as well:

    How to get synonyms from nltk WordNet Python

    And the NLTK documentation on its interface with WordNet:

    http://www.nltk.org/howto/wordnet.html

    I don't think these address your question, however, since WordNet doesn't have usage statistics (which are dependent on the corpus you use). You should be able to apply its synsets in a method like above, though.

提交回复
热议问题