How to configure stemming in Solr?

前端 未结 2 500
渐次进展
渐次进展 2021-01-02 17:06

I add to solr index: \"American\". When I search by \"America\" there is no results.

How should schema.xml be configured to get results?

current configuratio

相关标签:
2条回答
  • 2021-01-02 17:58

    You have to use one stemmer for an analyzer and EnglishPorterFilterFactory is deprecated as @Marko already mentioned. So you should remove this one from analyzers.

    I used SnowballPorterFilterFactory for both index and query analyzers -

    <fieldType name="text_stem">
        <analyzer> 
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.SnowballPorterFilterFactory"/>
            <!-- other filters -->
        </analyzer>
    </fieldType>
    

    The fieldType definition is pretty self explanatory, but just in case:

    • Tokenizer solr.WhitespaceTokenizerFactory: This operation will break up the sentences into words, using whitespaces as delimiters.

    • Filter solr.SnowballPorterFilterFactory: This filter will apply a stemming algorithm to each word (token). In the example above I have chosen the Snowball Porter stemming algorithm. Solr provides a few implementation of popular stemming algorithms.

    You can browse several other stemming algorithms e.g. HunspellStemFilterFactory, KStemFilterFactory too.

    0 讨论(0)
  • 2021-01-02 18:00

    Why would you have two stemmers?
    Try removing EnglishPorterFilterFactory (deprecated) from both of your analyzer types, rebuild the index and then try whether search for American will yield America.

    If that wont work, the other thing you can try is to remove both of your stemmer filters and add SnowballPorterFilterFactory with language="English" instead.

    0 讨论(0)
提交回复
热议问题