Token Chars Mapping to Ngram Filter ElasticSearch NEST

后端 未结 1 1573
有刺的猬
有刺的猬 2021-01-14 18:53

I\'m trying to replicate the below mappings using NEST and facing an issue while mapping the token chars to the tokenizer.

{
   \"settings\": {
      \"analy         


        
相关标签:
1条回答
  • 2021-01-14 19:05

    NGram Tokenizer supports token characters (token_chars), using these to determine which characters should be kept in tokens and split on anything that isn't represented in the list.

    NGram Token Filter on the other hand operates on the tokens produced by a tokenizer, so only has options for the min and max grams that should be produced.

    Based on your current analysis chain, it's likely you want something like the following

    var createIndexResponse = client.CreateIndex(defaultIndex, c => c
        .Settings(st => st
            .Analysis(an => an
                .Analyzers(anz => anz
                    .Custom("ngram_analyzer", cc => cc
                        .Tokenizer("ngram_tokenizer")
                        .Filters(nGramFilters))
                    )
                .Tokenizers(tz => tz
                    .NGram("ngram_tokenizer", td => td
                        .MinGram(2)
                        .MaxGram(20)
                        .TokenChars(
                            TokenChar.Letter,
                            TokenChar.Digit,
                            TokenChar.Punctuation,
                            TokenChar.Symbol
                        )
                    )          
                )
            )
        )
    );
    
    0 讨论(0)
提交回复
热议问题