I have a list of cities on mysql db which is hooked onto a UI for autocompletion purposes. I am currently using solr-5.3.0. Data import is happening through scheduled delta imports. I have the following questions:
I want to implement spell checker to this feature. I tried using:
- DirectSolrSpellChecker
- IndexBasedSpellChecker
- FileBasedSpellChecker
Out of these 3 only FileBasedSpellChecker is able to give suggestions that solely exists on db. For eg, while searching cologne I've got results like{ "responseHeader":{ "status":0, "QTime":4, "params":{ "q":"searchfield:kolakata", "indent":"true", "spellcheck":"true", "wt":"json"}}, "response":{"numFound":0,"start":0,"docs":[] }, "spellcheck":{ "suggestions":[ "cologne",{ "numFound":4, "startOffset":12, "endOffset":19, "suggestion":["Cologne", "Bologna", "Cogne", "Bastogne"]}], "collations":[ "collation","searchfield:Cologne"]}}
These cities are pretty accurate and exists in db/file.
But when I use other 2 I got results like
{ "responseHeader":{ "status":0, "QTime":4, "params":{ "q":"searchfield:kolakata", "indent":"true", "spellcheck":"true", "wt":"json"}}, "response":{"numFound":0,"start":0,"docs":[] }, "spellcheck":{ "suggestions":[ "cologne",{ "numFound":4, "startOffset":12, "endOffset":19, "suggestion":["Cologne", "Cologn", "Colognei"]}], "collations":[ "collation","searchfield:Cologne"]}}
These cities who are not present in my db.
Though FileBasedSpellChecker is giving satisfactory results, but I am a little apprehensive in using them because, I would need to keep updating the file manually everytime a new city gets added/removed. Also its generally not advisable to use FileBasedSpellChecker in general.
I also need to make the suggestions searchable as well, that means currently I am accessing the doc returned in
to search for results in that city, but now I want the suggestor to return the results in the same
instead of just string results, in order to get it integrated with UI properly.One minor change requested is to sort the suggestions in ascending order of edit/levenshtein distance. This is not a hard requirement and can be negotiated with.
edit My solrconfig looks like this:
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">searchfield</str>
<str name="spellcheck">true</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.dictionary">file</str>
<str name="spellcheck.maxCollationTries">5</str>
<str name="spellcheck.count">5</str>
<arr name="last-components">
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_ngram</str>
<lst name="spellchecker">
<str name="name">file</str>
<str name="classname">solr.FileBasedSpellChecker</str>
<str name="sourceLocation">spellings.txt</str>
<str name="spellcheckIndexDir">./spellchecker</str>
schema looks like this:
<field name="name" type="string" indexed="true" stored="true" multiValued="false" />
<field name="latlng" type="location" indexed="true" stored="true" multiValued="false" />
<field name="citycode" type="string" indexed="true" stored="true" multiValued="false" />
<field name="country" type="string" indexed="true" stored="true" multiValued="false" />
<field name="searchscore" type="float" indexed="true" stored="true" multiValued="false" />
<field name="searchfield" type="text_ngram" indexed="true" stored="false" multiValued="true" omitNorms="true" omitTermFreqAndPositions="true" />
<solrQueryParser defaultOperator="OR"/>
<copyField source="name" dest="searchfield"/>