问题
I have a list of cities on mysql db which is hooked onto a UI for autocompletion purposes. I am currently using solr-5.3.0. Data import is happening through scheduled delta imports. I have the following questions:
I want to implement spell checker to this feature. I tried using:
- DirectSolrSpellChecker
- IndexBasedSpellChecker
- FileBasedSpellChecker
Out of these 3 only FileBasedSpellChecker is able to give suggestions that solely exists on db. For eg, while searching cologne I've got results like{ "responseHeader":{ "status":0, "QTime":4, "params":{ "q":"searchfield:kolakata", "indent":"true", "spellcheck":"true", "wt":"json"}}, "response":{"numFound":0,"start":0,"docs":[] }, "spellcheck":{ "suggestions":[ "cologne",{ "numFound":4, "startOffset":12, "endOffset":19, "suggestion":["Cologne", "Bologna", "Cogne", "Bastogne"]}], "collations":[ "collation","searchfield:Cologne"]}}
These cities are pretty accurate and exists in db/file.
But when I use other 2 I got results like
{ "responseHeader":{ "status":0, "QTime":4, "params":{ "q":"searchfield:kolakata", "indent":"true", "spellcheck":"true", "wt":"json"}}, "response":{"numFound":0,"start":0,"docs":[] }, "spellcheck":{ "suggestions":[ "cologne",{ "numFound":4, "startOffset":12, "endOffset":19, "suggestion":["Cologne", "Cologn", "Colognei"]}], "collations":[ "collation","searchfield:Cologne"]}}
These cities who are not present in my db.
Though FileBasedSpellChecker is giving satisfactory results, but I am a little apprehensive in using them because, I would need to keep updating the file manually everytime a new city gets added/removed. Also its generally not advisable to use FileBasedSpellChecker in general.
I also need to make the suggestions searchable as well, that means currently I am accessing the doc returned in
"responseHeader":{"response":{"docs":[<some-format>]}}
to search for results in that city, but now I want the suggestor to return the results in the same
<some-format>
instead of just string results, in order to get it integrated with UI properly.One minor change requested is to sort the suggestions in ascending order of edit/levenshtein distance. This is not a hard requirement and can be negotiated with.
edit My solrconfig looks like this:
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">searchfield</str>
<str name="spellcheck">true</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.dictionary">file</str>
<str name="spellcheck.maxCollationTries">5</str>
<str name="spellcheck.count">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
and
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_ngram</str>
<lst name="spellchecker">
<str name="name">file</str>
<str name="classname">solr.FileBasedSpellChecker</str>
<str name="sourceLocation">spellings.txt</str>
<str name="spellcheckIndexDir">./spellchecker</str>
</lst>
</searchComponent>
schema looks like this:
<field name="name" type="string" indexed="true" stored="true" multiValued="false" />
<field name="latlng" type="location" indexed="true" stored="true" multiValued="false" />
<field name="citycode" type="string" indexed="true" stored="true" multiValued="false" />
<field name="country" type="string" indexed="true" stored="true" multiValued="false" />
<field name="searchscore" type="float" indexed="true" stored="true" multiValued="false" />
<field name="searchfield" type="text_ngram" indexed="true" stored="false" multiValued="true" omitNorms="true" omitTermFreqAndPositions="true" />
<defaultSearchFieldsearchfield</defaultSearchField>
<solrQueryParser defaultOperator="OR"/>
<copyField source="name" dest="searchfield"/>
来源:https://stackoverflow.com/questions/40743684/need-help-to-decide-between-the-type-of-spellchecker-to-use-in-solr