Emails not being searched properly in elasticsearch

后端 未结 2 360
说谎
说谎 2020-12-22 00:37

I have indexed a few documents in elasticsearch which have email ids as a field. But when I query for a specific email id, the search results are showing all the documents w

相关标签:
2条回答
  • 2020-12-22 01:11

    This happens when you use the default mappings. Elasticsearch has uax_url_email tokenizers which would identify the urls and emails as a single entity/token. You can read more about this here and here

    0 讨论(0)
  • 2020-12-22 01:24

    By default, your mail-id field is analyzed by the standard analyzer which will tokenize the email abc@gmail.com into the following two tokens:

    {
      "tokens" : [ {
        "token" : "abc",
        "start_offset" : 0,
        "end_offset" : 3,
        "type" : "<ALPHANUM>",
        "position" : 1
      }, {
        "token" : "gmail.com",
        "start_offset" : 4,
        "end_offset" : 13,
        "type" : "<ALPHANUM>",
        "position" : 2
      } ]
    }
    

    What you need instead is to create a custom analyzer using the UAX email URL tokenizer, which will tokenize email addresses as a one token.

    So you need to define your index as follows:

    curl -XPUT localhost:9200/people -d '{
      "settings": {
        "analysis": {
          "analyzer": {
            "my_analyzer": {
              "type": "custom",
              "tokenizer": "uax_url_email"
            }
          }
        }
      },
      "mappings": {
        "person": {
          "properties": {
            "mail-id": {
              "type": "string",
              "analyzer": "my_analyzer"
            }
          }
        }
      }
    }'
    

    After creating that index, you can see that the email abc@gmail.com will be tokenized as a single token and your search will work as expected.

     curl -XGET 'localhost:9200/people/_analyze?analyzer=my_analyzer&pretty' -d 'abc@gmail.com'
    {
      "tokens" : [ {
        "token" : "abc@gmail.com",
        "start_offset" : 0,
        "end_offset" : 13,
        "type" : "<EMAIL>",
        "position" : 1
      } ]
    }
    
    0 讨论(0)
提交回复
热议问题