Elastic in-doc date comparison issue

半世苍凉 提交于 2019-12-24 18:33:54

问题


I have an elastic index with thousands of such docs.

{
    Name: John Doe,
    FirstJobStartDate: 8/9/2016,
    FirstJobEndDate:1/4/2019,
    SecondJobStartDate:7/4/2019,
    SecondJobEndDate:8/8/2020,
    ThirdJobStartDate: 1/9/2020,
}

Except for Name & FirstJobStartDate, any other field is optional and may or may not be present in the doc.

I need to get 4 numbers:

1) How many docs have a FirstJobEndDate? That's easy

{
  "size":1,    
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            {
              "exists": {
                "field": "FirstJobEndDate"
              }
            }
          ]
        }
      }
    }
  }
}

Now it gets complex:

2) How many docs have a FirstJobEndDate that is lesser than the current date and they don't have EVEN ONE of (SecondJobStartDate, SecondJobEndDate or ThirdJobStartDate)?

3) How many docs have a FirstJobEndDate, also have ANY ONE of (SecondJobStartDate, SecondJobEndDate, ThirdJobStartDate) and ANY ONE of those dates is within 1 Year of FirstJobEndDate?

4) How many docs have a FirstJobEndDate, also have ANY ONE of (SecondJobStartDate, SecondJobEndDate, ThirdJobStartDate) and NONE of those dates is within 1 Year of FirstJobEndDate?

I believe this can be done with a correct mix of 'must' and 'should', but can't get any clear solution because of the comparison between two dates within the same document.

Just to confirm, all the dates are valid elastic date type fields and not strings.

Any help would be greatly appreciated. Elastic version: 2.4


回答1:


Try these:

For the second query:

{
  "size": 1,
  "query": {
    "bool": {
      "filter": [
        {
          "exists": {
            "field": "FirstJobEndDate"
          }
        }
      ],
      "must_not": [
        {
          "exists": {
            "field": "SecondJobStartDate"
          }
        },
        {
          "exists": {
            "field": "SecondJobEndDate"
          }
        },
        {
          "exists": {
            "field": "ThirdJobStartDate"
          }
        }
      ]
    }
  }
}

For the third query:

{
  "size": 1,
  "query": {
    "bool": {
      "filter": [
        {
          "exists": {
            "field": "FirstJobEndDate"
          }
        }
      ],
      "minimum_should_match": 1,
      "should": [
        {
          "script": {
            "script": "doc.SecondJobStartDate.date != null && doc.SecondJobStartDate.date.getMillis() - doc.FirstJobEndDate.date.getMillis() <= 31540000000"
          }
        },
        {
          "script": {
            "script": "doc.SecondJobEndDate.date != null && doc.SecondJobEndDate.date.getMillis() - doc.FirstJobEndDate.date.getMillis() <= 31540000000"
          }
        },
        {
          "script": {
            "script": "doc.ThirdJobStartDate.date != null && doc.ThirdJobStartDate.date.getMillis() - doc.FirstJobEndDate.date.getMillis() <= 31540000000"
          }
        }
      ]
    }
  }
}

For the fourth query:

{
  "size": 1,
  "query": {
    "bool": {
      "filter": [
        {
          "exists": {
            "field": "FirstJobEndDate"
          }
        }
      ],
      "must_not": [
        {
          "script": {
            "script": "doc.SecondJobStartDate.date != null && doc.SecondJobStartDate.date.getMillis() - doc.FirstJobEndDate.date.getMillis() <= 31540000000"
          }
        },
        {
          "script": {
            "script": "doc.SecondJobEndDate.date != null && doc.SecondJobEndDate.date.getMillis() - doc.FirstJobEndDate.date.getMillis() <= 31540000000"
          }
        },
        {
          "script": {
            "script": "doc.ThirdJobStartDate.date != null && doc.ThirdJobStartDate.date.getMillis() - doc.FirstJobEndDate.date.getMillis() <= 31540000000"
          }
        }
      ]
    }
  }
}

Just a tip: As you can see, you need to leverage scripting and that can penalize the performance. Since you know which dates you want to compare beforehand, you should store the date differences in additional scalar fields that you can easily compare with range queries afterwards.



来源:https://stackoverflow.com/questions/48951313/elastic-in-doc-date-comparison-issue

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!