Solr: combining EdgeNGramFilterFactory and NGramFilterFactory

前端 未结 2 788
我在风中等你
我在风中等你 2021-02-09 13:59

I have a situation where I need to use both EdgeNGramFilterFactory and NGramFilterFactory.

I am using NGramFilterFactory to perform a \"contains\" style search with min

2条回答
  •  囚心锁ツ
    2021-02-09 14:35

    Start by applying the EdgeNgramFilter with min = 1 and max = 1000 (we want the entire original token to be included). Example:

    hello => 'h', 'he', 'hel', 'hell', 'hello'

    Secondly use the NGramFilter with min = 2. (I will use 2 as the max in the example for simplicity)

    'h', 'he', 'hel', 'hell', 'hello' => 'h', 'he', 'he', 'el', 'he', 'el', 'll', 'he', 'el', 'll', 'lo'

    Now you will have several identical tokens since you have applied the NGramFilter on all "partial" tokens from the EdgeNGramFilter but simply apply the RemoveDuplicatesTokensFilter to remove those.

    'h', 'he', 'he', 'el', 'he', 'el', 'll', 'he', 'el', 'll', 'lo' => 'h', 'he', 'el', 'll', 'lo'

    Now your field will support a single char "startsWith" query and a multiple chars "contains" query.

提交回复
热议问题