Fast fuzzy/approximate search in dictionary of strings in Ruby

后端未结

关注

 4  674

渐次进展 2021-02-08 23:55

I have a dictionary of 50K to 100K strings (can be up to 50+ characters) and I am trying to find whether a given string is in the dictionary with some \"edit\" distance toleranc

4条回答

[愿得一人] (楼主)

2021-02-09 00:43

If you are prepared to get involved with Machine Learning approaches, then this paper by Geoff Hinton will be a good starting point

http://www.cs.toronto.edu/~hinton/absps/sh.pdf

These kind of approaches are used in places like Google etc.

Essentially you cluster your dictionary strings based on similarity. When the query string comes, instead of calculating the edit distance against the entire data set, you just compare the cluster thus reducing query time significantly.

P.S I did a bit of googling, found a Ruby implementation of another similar approach called Locality Sensitive Hashing here https://github.com/bbcrd/ruby-lsh

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...