Fast fuzzy/approximate search in dictionary of strings in Ruby

后端 未结 4 657
渐次进展
渐次进展 2021-02-08 23:55

I have a dictionary of 50K to 100K strings (can be up to 50+ characters) and I am trying to find whether a given string is in the dictionary with some \"edit\" distance toleranc

4条回答
  •  [愿得一人]
    2021-02-09 00:43

    If you are prepared to get involved with Machine Learning approaches, then this paper by Geoff Hinton will be a good starting point

    http://www.cs.toronto.edu/~hinton/absps/sh.pdf

    These kind of approaches are used in places like Google etc.

    Essentially you cluster your dictionary strings based on similarity. When the query string comes, instead of calculating the edit distance against the entire data set, you just compare the cluster thus reducing query time significantly.

    P.S I did a bit of googling, found a Ruby implementation of another similar approach called Locality Sensitive Hashing here https://github.com/bbcrd/ruby-lsh

提交回复
热议问题