问题
I have problems with solr stopwords in my autosuggest. All stopwords was replaced by _ symbol.
For example I have text "the simple text in" in field "deal_title". When I try to search word "simple" solr show me next result "_ simple text _" but I expect "simple text".
Could someone explain me why this works in such way and how to fix it ? Here is part of my schema.xml
<fieldType class="solr.TextField" name="text_auto">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="3" outputUnigrams="true" outputUnigramsIfNoShingles="false" />
</analyzer>
<analyzer type="query">
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
</analyzer>
</fieldType>
<field name="deal_title" type="text_auto" indexed="true" stored="true" required="false" multiValued="false"/>
<fieldType name="text_general" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
回答1:
My solution to this in Solr 6.3 (where enablePositionIncrements="false"
isn't possible anymore) was to:
- remove stopwords
- shingle with
fillerToken=""
(which removes the_
) - remove leading and trailing spaced
remove duplicates
<filter class="solr.StopFilterFactory" format="snowball" words="lang/stopwords_de.txt" ignoreCase="true"/> <filter class="solr.ShingleFilterFactory" fillerToken=""/> <filter class="solr.PatternReplaceFilterFactory" pattern="(^ | $)" replacement=""/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
回答2:
To fix this you need to use<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true" enablePositionIncrements="false" />
and <luceneMatchVersion>4.3</luceneMatchVersion>
in solconfig.xml
来源:https://stackoverflow.com/questions/28459949/solr-stop-words-replaced-with-symbol