We have a database of movies and series, and as the data comes from many sources of varying reliability, we\'d like to be able to do fuzzy string matching on the titles of episo
To answer to the last part of your question: solr has also an ngram filter. So you should not use the ngram tokenizer (but one like "WhitespaceTokenizer" for example), apply all pre-ngram filters and then add this one:
<filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="3" />
The solution turned out to be very simple: AND was set as the default operator, and if any of the ngrams didn't match, the whole query failed. So, it was sufficient to add:
<solrQueryParser defaultOperator="OR" />
in my schema definition.