Methods for automated synonym detection

后端 未结 4 1597
忘了有多久
忘了有多久 2021-02-03 14:04

I am currently working on a neural network based approach to short document classification, and since the corpuses I am working with are usually around ten words, the standard s

4条回答
  •  后悔当初
    2021-02-03 14:44

    I'm not sure if I misunderstand your question. Do you require the system to be able to reason based on your input data alone, or would it be acceptable to refer to an external dictionary?

    If it is acceptable, I would recommend you to take a look at http://wordnet.princeton.edu/ which is a database of English word relationships. (It also exists for a few other languges.) These relationships include synonyms, antonyms, hyperonyms (which is what you really seem to be looking for, rather than synonyms), hyponyms, etc.

    The hyperonym / hyponym relationship links more generic terms to more specific ones. The words "banana" and "orange" are hyponyms of "fruit"; it is a hyperonym of both. http://en.wikipedia.org/wiki/Hyponymy Of course, "orange" is ambiguous, and is also a hyponym of "color".

    You asked for a method, but I can only point you to data. Even if this turns out to be useful, you will obviously need quite a bit of work to use it for your particular application. For one thing, how do you know when you have reached a suitable level of abstraction? Unless your input is hevily normalized, you will have a mix of generic and specific terms. Do you stop at "citrus","fruit", "plant", "animate", "concrete", or "noun"? (Sorry, just made up this particular hierarchy.) Still, hope this helps.

提交回复
热议问题