How to use n-grams approximate matching with Solr?

前端未结

关注

 2  2048

We have a database of movies and series, and as the data comes from many sources of varying reliability, we\'d like to be able to do fuzzy string matching on the titles of episo

相关标签:

2条回答

礼貌的吻别

2021-02-05 22:33
To answer to the last part of your question: solr has also an ngram filter. So you should not use the ngram tokenizer (but one like "WhitespaceTokenizer" for example), apply all pre-ngram filters and then add this one:
```
<filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="3" />
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
無奈伤痛

2021-02-05 22:49
The solution turned out to be very simple: AND was set as the default operator, and if any of the ngrams didn't match, the whole query failed. So, it was sufficient to add:
```
<solrQueryParser defaultOperator="OR" />
```
in my schema definition.
0 讨论(0)
发布评论:

提交评论
- 加载中...