Exact-match, case-insensitive match without normalization in Elasticsearch 6.2

谁说胖子不能爱 提交于 2020-05-17 04:55:30

问题


I have looked at every article and post I could find about performing exact-match, case-insensitive queries, but upon implementation, they do not perform what I am looking for.

Before you mark this question as a duplicate, please read the entire post.

Given a username, I want to query my Elasticsearch database to only return a document that exactly matches the username, but is also case insensitive.

I have tried specifying a lowercase analyzer for my username property and use a match query to implement this behavior. While this solves the problem of case insensitive matching, it fails at exact matching.

I looked into using a lowercase normalizer, but that would make all of my usernames lowercase before indexing, so when I aggregate the usernames, they would return in lowercase form, which is not what I want. I need to preserve the original case of each letter in the username.

What I want is the following behavior:


Inserting Users

POST {elastic}/users/_doc

{
    "email": "random@email.com",
    "username": "UsErNaMe",
    "password": "1234567"
}

This document will be stored in an index called users exactly the way it is.

Getting a User by Username

GET {frontend}/user/UsErNaMe

should return

{
    "email": "random@email.com",
    "username": "UsErNaMe",
    "password": "1234567"
}

and

GET {frontend}/user/username

should return

{
    "email": "random@email.com",
    "username": "UsErNaMe",
    "password": "1234567"
}

and

GET {frontend}/user/USERNAME

should return

{
    "email": "random@email.com",
    "username": "UsErNaMe",
    "password": "1234567"
}

and

GET {frontend}/user/UsErNaMe $RaNdoM LeTteRs

should NOT return anything.

Thank you.


回答1:


To achieve case insensitive exact match you need to define you own analyzer. The analyzer need to perform two actions:

  1. lowercase the input value. (for case insensitive)
  2. no to any modification to the input after lowercase action. (for exact search)

The above two can be achieve by:

  1. use lowercase filter when defining custom analyzer.
  2. set the tokenizer to keyword, this will make sure to generate single token of the input value after lowercase filter is applied.

Now this custom analyzer can be applied to a text field where case insensitive exact search is required.

So to create index you can use below:

PUT test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "case_insensitive_analyzer": {
          "type": "custom",
          "filter": [
            "lowercase"
          ],
          "tokenizer": "keyword"
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "email": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        },
        "username": {
          "type": "text",
          "analyzer": "case_insensitive_analyzer"
        },
        "password": {
          "type": "keyword"
        }
      }
    }
  }
}

In the above case_insensitive_analyzer is the required analyzer and as you can see it is applied on username field.

So when you index a document as below:

PUT test/_doc/1
{
  "email": "random@email.com",
  "username": "UsErNaMe",
  "password": "1234567"
}

for the field username the input is UsErNaMe. The analyzer first applies lowercase filter on the input UsErNaMe resulting into the value username. Now on this value username it applies keyword tokenizer which does nothing but output the value obtained after applying filter(s), as a single token i.e. username.

Now you can use match query as below to search against user name field:

GET test/_doc/_search
{
  "query": {
    "match": {
      "username": "USERNAME"
    }
  }
}

Using above will give you desired output. Replace USERNAME in above query to username or UsErNaMe or USERname all will match the document. The reason for this is that while searching if no analyser is explicitly specified, elasticsearch uses the analyzer applied to the field while indexing. In the above case when searching against field username, case_insensitive_analyzer will be applied to input value i.e. USERNAME which will result in token username and hence the match.



来源:https://stackoverflow.com/questions/55742477/exact-match-case-insensitive-match-without-normalization-in-elasticsearch-6-2

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!