ElasticSearch Analyzer and Tokenizer for Emails

前端 未结 1 1785
孤独总比滥情好
孤独总比滥情好 2020-12-08 04:55

I could not find a perfect solution either in Google or ES for the following situation, hope someone could help here.

Suppose there are five email addresses stored u

相关标签:
1条回答
  • 2020-12-08 05:17

    Mapping:

    PUT /test
    {
      "settings": {
        "analysis": {
          "filter": {
            "email": {
              "type": "pattern_capture",
              "preserve_original": 1,
              "patterns": [
                "([^@]+)",
                "(\\p{L}+)",
                "(\\d+)",
                "@(.+)",
                "([^-@]+)"
              ]
            }
          },
          "analyzer": {
            "email": {
              "tokenizer": "uax_url_email",
              "filter": [
                "email",
                "lowercase",
                "unique"
              ]
            }
          }
        }
      },
      "mappings": {
        "emails": {
          "properties": {
            "email": {
              "type": "string",
              "analyzer": "email"
            }
          }
        }
      }
    }
    

    Test data:

    POST /test/emails/_bulk
    {"index":{"_id":"1"}}
    {"email": "john.doe@gmail.com"}
    {"index":{"_id":"2"}}
    {"email": "john.doe@gmail.com, john.doe@outlook.com"}
    {"index":{"_id":"3"}}
    {"email": "hello-john.doe@outlook.com"}
    {"index":{"_id":"4"}}
    {"email": "john.doe@outlook.com"}
    {"index":{"_id":"5"}}
    {"email": "john@yahoo.com"}
    

    Query to be used:

    GET /test/emails/_search
    {
      "query": {
        "term": {
          "email": "john.doe@gmail.com"
        }
      }
    }
    
    0 讨论(0)
提交回复
热议问题