How to practially use a keywordanalyzer in azure-search?

99封情书 提交于 2019-12-01 12:11:22

Short answer:

The behavior you're observing is correct.

Semantically, your search query blue bear means: find all documents that match the term blue or the term bear. Since you are using the keyword tokenizer the terms that you indexed are blue bear and blue bear123. The terms blue and bear individually don't exist in your index. That's why only the phrase query returns the result you are expecting.


Long answer:

Let me explain how the analyzer is applied during query processing and how it's applied during document indexing.

On the indexing side, the analyzer you defined processes elements of the keyWordList collection independently. The terms that end up in your inverted index are:

  • blue bear (since you're using the lowercase filter blue bear and Blue Bear are tokenized to the same term).
  • blue bear123

    As you'd expect blue bear is one term - not split into two on space - since you're using the keyword tokenizer. Same applies to blue bear123

On the query processing side, two things happen:

  1. Your search query is rewritten too: blue|bear (find documents blue or bear). This is because searchMode=any is used by default. If you used searchMode=all, your search query would be rewritten to blue+bear (find documents with blue and bear).

    The query parser takes your search query string and separates query operators (such as +, |, * etc.) from query terms. Then it decomposes the search query into subqueries of supported types e.g., terms followed by the suffix operator ‘*’ become a prefix query, quoted terms a phrase query etc. Terms that are not preceded or followed by any the supported operators become individual term queries.

    In your example, the query parser decomposed your query string blue|bear into two term queries with terms blue and bear respectively. The search engine looks for documents that match any of those queries (searchMode=any).

  2. Query terms of the identified subqueries are processed by the search analyzer.

    In your example, terms blue and bear are processed by the analyzer individually. They are not modified since they are already lowercase. None of those tokens exist in your index, thus no results are returned.

    If you query looked as follows: "Blue Bear" (with quotes) it would be rewritten to "Blue Bear" - notice no change, the OR operator has not been put between the words since now you're looking for a phrase. The query parser passes the entire phrase term (two words) to the analyzer which in turn outputs a single, lowercased term: blue bear. This token matches what's in your index.

The key lesson here is that the query parser processes the query string before the analyzers are applied. The analyzers are applied to individual terms of subqueries identified by the query parser.

I hope this helps you understand the behavior you're observing. Note, you can test the output of your custom analyzer using the Analyze API.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!