elasticsearch query issue with ngram

£可爱£侵袭症+ 提交于 2020-01-25 17:18:45

问题


i have this data in my index

https://gist.github.com/bitgandtter/6794d9b48ae914a3ac7c

If you notice in the mapping im using the ngram from 3 tokens to 20.

when i execute this query:

GET /my_index/user/_search?search_type=dfs_query_then_fetch
{
  "query": {
    "filtered": { 
      "query":{
        "multi_match":{
          "query": "F",
          "fields": ["username","firstname","middlename","lastname"],
          "analyzer": "custom_search_analyzer"
        }
      }
    }
  }
}

I should get the 8 documents i have indexed but i get only 6 leaving out two with their names are Franz and Francis. I expect to have those two also because the f its included in the data. for some reason its not working.

when i execute:

GET /my_index/user/_search?search_type=dfs_query_then_fetch
{
  "query": {
    "filtered": { 
      "query":{
        "multi_match":{
          "query": "Fran",
          "fields": ["username","firstname","middlename","lastname"],
          "analyzer": "custom_search_analyzer"
        }
      }
    }
  }
}

i get those two documents.

If i lower the ngram to start at 1 i get all the documents but i think this will affect the performance of the query.

What im missing here. Thanks in advance.

NOTE: all the examples are coded used sense


回答1:


This is expected since the min_gram is specified as 3 it would mean that the minimum length of token produced by the custom analyzer is 3 codepoints.

Hence the first token for "Franz Silva" would be "Fra". Hence token "F" would not be a match on this document.

One can test out the tokens produced by the analyzer using :

curl -Xget "http://<server>/index_name/_analyze?analyzer=custom_analyzer&text=Franz Silva"

Also note since the "custom_analyzer" specified above does not specify "token_chars", the tokens can contain spaces.



来源:https://stackoverflow.com/questions/29635635/elasticsearch-query-issue-with-ngram

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!