case insensitive elasticsearch with uppercase or lowercase

我的梦境 提交于 2019-12-04 17:45:54

Is there any specific reason you are using ngram? Elasticsearch uses the same analyzer on the "query" as well as the text you index - unless search_analyzer is explicitly specified, as mentioned by @Adam in his answer. In your case it might be enough to use a standard tokenizer with a lowercase filter

I created an index with the following settings and mapping:

{
   "settings": {
      "analysis": {
         "analyzer": {
            "custom_analyzer": {
               "type": "custom",
               "tokenizer": "standard",
               "filter": [
                  "lowercase"
               ]
            }
         }
      }
   },
   "mappings": {
      "typehere": {
         "properties": {
            "name": {
               "type": "string",
               "analyzer": "custom_analyzer"
            },
            "description": {
               "type": "string",
               "analyzer": "custom_analyzer"
            }
         }
      }
   }
}

Indexed two documents Doc 1

PUT /test_index/test_mapping/1
    {
        "name" : "Sara Connor",
        "Description" : "My real name is Sarah Connor."
    }

Doc 2

PUT /test_index/test_mapping/2
    {
        "name" : "John Connor",
        "Description" : "I might save humanity someday."
    }

Do a simple search

POST /test_index/_search?query=sara
{
    "query" : {
        "match" : {
            "name" : "SARA"
        }
    }
}

And get back only the first document. I tried with "sara" and "Sara" also, same results.

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.19178301,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_mapping",
        "_id": "1",
        "_score": 0.19178301,
        "_source": {
          "name": "Sara Connor",
          "Description": "My real name is Sarah Connor."
        }
      }
    ]
  }
}

The analysis process is executed for full-text search fields (analysed) twice: first when data are stored and the second time when you search. It’s worth to say that input JSON will be returned in the same shape as an output from a search query. The analysis process is only used to create tokens for an inverted index. Key to your solution are the following steps:

  1. Create two analysers one with ngram filter and second analyser without ngram filter because you don’t need to analyse input search query using ngram because you have an exact value that you want to search.
  2. Define mappings correctly for your fields. There are two fields in the mapping that allow you to specify analysers. One is used for storage (analyzer) and second, is used for searching (search_analyzer) – if you specified only analyser field then specified analyser is used for index and search time.

You can read more about it here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.html

And your code should look like that:

PUT /my_index
{
   "settings": {
      "analysis": {
         "filter": {
            "ngram_filter": {
               "type": "ngram",
               "min_gram": 1,
               "max_gram": 5
            }
         },
         "analyzer": {
            "index_store_ngram": {
               "type": "custom",
               "tokenizer": "standard",
               "filter": [
                  "ngram_filter",
                  "lowercase"
               ]
            }
         }
      }
   },
   "mappings": {
      "my_type": {
         "properties": {
            "name": {
               "type": "string",
               "analyzer": "index_store_ngram",
               "search_analyzer": "standard"
            }
         }
      }
   }
}

post /my_index/my_type/1
{
     "name": "Sara_11_01"
}

GET /my_index/my_type/_search
{
    "query": {
        "match": {
           "name": "sara"
        }
    }
}

GET /my_index/my_type/_search
{
    "query": {
        "match": {
           "name": "SARA"
        }
    }
}

GET /my_index/my_type/_search
{
    "query": {
        "match": {
           "name": "SaRa"
        }
    }
}

Edit 1: updated code for a new example provided in the question

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!