How to use n-grams approximate matching with Solr?

前端 未结 2 2048
感动是毒
感动是毒 2021-02-05 21:58

We have a database of movies and series, and as the data comes from many sources of varying reliability, we\'d like to be able to do fuzzy string matching on the titles of episo

相关标签:
2条回答
  • 2021-02-05 22:33

    To answer to the last part of your question: solr has also an ngram filter. So you should not use the ngram tokenizer (but one like "WhitespaceTokenizer" for example), apply all pre-ngram filters and then add this one:

    <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="3" />
    
    0 讨论(0)
  • 2021-02-05 22:49

    The solution turned out to be very simple: AND was set as the default operator, and if any of the ngrams didn't match, the whole query failed. So, it was sufficient to add:

    <solrQueryParser defaultOperator="OR" />
    

    in my schema definition.

    0 讨论(0)
提交回复
热议问题