Locality Sensitive Hash Implementation? [closed]

后端未结

关注

 4  423

一整个雨季

相关标签:

4条回答

春和景丽

2020-12-24 03:08

I realise you explicitly asked for C/C++/C#, but there is a Python port of the nilsimsa hash which might be easier to grok than other, larger libraries.

0 讨论(0)
发布评论:

提交评论
- 加载中...
礼貌的吻别

2020-12-24 03:14

Well there is an excellent into article at MSDN blogs here: http://blogs.msdn.com/b/spt/archive/2008/06/11/locality-sensitive-hashing-lsh-and-min-hash.aspx

Also there is at least once C++ library which you can inspect the source code of here: http://sourceforge.net/projects/lshkit/

0 讨论(0)
发布评论:

提交评论
- 加载中...
滥情空心

2020-12-24 03:19

There is also a Java Implementation on Hadoop. it does a good job on documents.

it's called LikeLike

Currently Likelike supports only Min-Wise independent permutations. Min-Wise independent permutations is applied to the recommendation of Google News

0 讨论(0)
发布评论:

提交评论
- 加载中...
长发绾君心

2020-12-24 03:20
For strings you can use approximate matching algorithm.
- Generate a random string
- For all the strings compute their distance from that random shared string using an algorithm like http://www.dotnetperls.com/levenshtein
If the strings are equidistant from a reference string then chances are that they are similar to each other. And there you go you have a locality senitive hash implementation for strings.

You can create different hash buckets for a range of distances.

EDIT: You can try other variations of string distance. A simpler algorithm would just return no. of common characters between two strings.
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题