levenshtein alternative

后端 未结 1 504
隐瞒了意图╮
隐瞒了意图╮ 2021-01-12 09:57

i have a big set of queries and use levenshtein to calculate typos, now levenshtein causes mysql to take full cpu time. My query is a fulltext search + levenshtein in a UNIO

相关标签:
1条回答
  • 2021-01-12 10:49

    If you are tied only to MySQL there is not an easy solution.

    Usually this is solved using specialized ngram indexing for fast candidate lookup filtering and then calculating levensthein only on like 10-50 candidates which is faster that calculating levensthein for all pairs.

    Specialized fulltext search engines like Solr/Lucene have this built in.

    PostgreSQL has pg_trgm contrib module (http://www.postgresql.org/docs/9.0/static/pgtrgm.html) which works like a charm.

    You can even simulate this in MySQL using fulltext indexing, but you have to collect words from all your documents convert them to ngrams, create fulltext indexes on them, and hack them all together for fast lookup. Which brings all sorts of trouble with redundancy, sync...not worth your time.

    0 讨论(0)
提交回复
热议问题