Measuring semantic similarity between two phrases [closed]

后端未结

关注

 2  1556

爱一瞬间的悲伤

相关标签:

2条回答

长情又很酷

2021-01-31 05:46

This is a very complicated problem.

The main technique that I can think of (before going into more complicated NLP processes) would be to apply cosine (or any other metric) similarity to each pair of phrases. Obviously this solution would be very inefficient at the moment due to the non-matching problem: The sentences might refer to the same concept with different words.

To solve this issue, you should transform the initial representation of each phrase with a more "conceptual" meaning. One option would be to extend each word with its synonyms (i.e. using WordNet, another option is to apply metrics such as distributional semantics DS (http://liawww.epfl.ch/Publications/Archive/Besanconetal2001.pdf) that extend the representation of each term with the more likely words to appear with it.

Example: A representation of a document: {"car","race"} would be transform to {"car","automobile","race"} with synonyms. While, with DS it would be something like: {"car","wheel","road","pilot", ...}

Obviously this transformation won't be binary. Each term will have some associated weights.

I hope this helps.

0 讨论(0)
发布评论:

提交评论
- 加载中...
野趣味

2021-01-31 05:51

Maybe the cortical.io API could help with your problem. The approach here is that every word is converted into a semantic fingerprint that characterizes the meaning of it with 16K semantic features. Phrases, sentences or longer texts are converted into fingerprints by ORing the word fingerprints together. After this conversion into a (numeric) binary vector representation semantic distance can easily be computed using distance measures like Euclidian Distance or cosine-similarity. All necessary conversion- and comparison-functions are provided by the api.

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题