Is there an algorithm that tells the semantic similarity of two phrases

后端未结

关注

 11  1208

input: phrase 1, phrase 2

output: semantic similarity value (between 0 and 1), or the probability these two phrases are talking about the same thing

相关标签:

11条回答

别跟我提以往

2020-11-27 10:37

You might want to check out this paper:

Sentence similarity based on semantic nets and corpus statistics (PDF)

I've implemented the algorithm described. Our context was very general (effectively any two English sentences) and we found the approach taken was too slow and the results, while promising, not good enough (or likely to be so without considerable, extra, effort).

You don't give a lot of context so I can't necessarily recommend this but reading the paper could be useful for you in understanding how to tackle the problem.

Regards,

Matt.

0 讨论(0)
发布评论:

提交评论
- 加载中...
你的背包

2020-11-27 10:38

Try SimService, which provides a service for computing top-n similar words and phrase similarity.

0 讨论(0)
发布评论:

提交评论
- 加载中...
梦谈多话

2020-11-27 10:39

I would have a look at statistical techniques that take into consideration the probability of each word to appear within a sentence. This will allow you to give less importance to popular words such as 'and', 'or', 'the' and give more importance to words that appear less regurarly, and that are therefore a better discriminating factor. For example, if you have two sentences:

1) The smith-waterman algorithm gives you a similarity measure between two strings. 2) We have reviewed the smith-waterman algorithm and we found it to be good enough for our project.

The fact that the two sentences share the words "smith-waterman" and the words "algorithms" (which are not as common as 'and', 'or', etc.), will allow you to say that the two sentences might indeed be talking about the same topic.

Summarizing, I would suggest you have a look at: 1) String similarity measures; 2) Statistic methods;

Hope this helps.

0 讨论(0)
发布评论:

提交评论
- 加载中...
面向向阳花

2020-11-27 10:41

This requires your algorithm actually knows what your talking about. It can be done in some rudimentary form by just comparing words and looking for synonyms etc, but any sort of accurate result would require some form of intelligence.

0 讨论(0)
发布评论:

提交评论
- 加载中...
孤独总比滥情好

2020-11-27 10:41

Take a look at http://mkusner.github.io/publications/WMD.pdf This paper describes an algorithm called Word Mover distance that tries to uncover semantic similarity. It relies on the similarity scores as dictated by word2vec. Integrating this with GoogleNews-vectors-negative300 yields desirable results.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2