问题
I have an elastic index with thousands of such docs.
{
Name: John Doe,
FirstJobStartDate: 8/9/2016,
FirstJobEndDate:1/4/2019,
SecondJobStartDate:7/4/2019,
SecondJobEndDate:8/8/2020,
ThirdJobStartDate: 1/9/2020,
}
Except for Name & FirstJobStartDate, any other field is optional and may or may not be present in the doc.
I need to get 4 numbers:
1) How many docs have a FirstJobEndDate? That's easy
{
"size":1,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"exists": {
"field": "FirstJobEndDate"
}
}
]
}
}
}
}
}
Now it gets complex:
2) How many docs have a FirstJobEndDate that is lesser than the current date and they don't have EVEN ONE of (SecondJobStartDate, SecondJobEndDate or ThirdJobStartDate)?
3) How many docs have a FirstJobEndDate, also have ANY ONE of (SecondJobStartDate, SecondJobEndDate, ThirdJobStartDate) and ANY ONE of those dates is within 1 Year of FirstJobEndDate?
4) How many docs have a FirstJobEndDate, also have ANY ONE of (SecondJobStartDate, SecondJobEndDate, ThirdJobStartDate) and NONE of those dates is within 1 Year of FirstJobEndDate?
I believe this can be done with a correct mix of 'must' and 'should', but can't get any clear solution because of the comparison between two dates within the same document.
Just to confirm, all the dates are valid elastic date type fields and not strings.
Any help would be greatly appreciated. Elastic version: 2.4
回答1:
Try these:
For the second query:
{
"size": 1,
"query": {
"bool": {
"filter": [
{
"exists": {
"field": "FirstJobEndDate"
}
}
],
"must_not": [
{
"exists": {
"field": "SecondJobStartDate"
}
},
{
"exists": {
"field": "SecondJobEndDate"
}
},
{
"exists": {
"field": "ThirdJobStartDate"
}
}
]
}
}
}
For the third query:
{
"size": 1,
"query": {
"bool": {
"filter": [
{
"exists": {
"field": "FirstJobEndDate"
}
}
],
"minimum_should_match": 1,
"should": [
{
"script": {
"script": "doc.SecondJobStartDate.date != null && doc.SecondJobStartDate.date.getMillis() - doc.FirstJobEndDate.date.getMillis() <= 31540000000"
}
},
{
"script": {
"script": "doc.SecondJobEndDate.date != null && doc.SecondJobEndDate.date.getMillis() - doc.FirstJobEndDate.date.getMillis() <= 31540000000"
}
},
{
"script": {
"script": "doc.ThirdJobStartDate.date != null && doc.ThirdJobStartDate.date.getMillis() - doc.FirstJobEndDate.date.getMillis() <= 31540000000"
}
}
]
}
}
}
For the fourth query:
{
"size": 1,
"query": {
"bool": {
"filter": [
{
"exists": {
"field": "FirstJobEndDate"
}
}
],
"must_not": [
{
"script": {
"script": "doc.SecondJobStartDate.date != null && doc.SecondJobStartDate.date.getMillis() - doc.FirstJobEndDate.date.getMillis() <= 31540000000"
}
},
{
"script": {
"script": "doc.SecondJobEndDate.date != null && doc.SecondJobEndDate.date.getMillis() - doc.FirstJobEndDate.date.getMillis() <= 31540000000"
}
},
{
"script": {
"script": "doc.ThirdJobStartDate.date != null && doc.ThirdJobStartDate.date.getMillis() - doc.FirstJobEndDate.date.getMillis() <= 31540000000"
}
}
]
}
}
}
Just a tip: As you can see, you need to leverage scripting and that can penalize the performance. Since you know which dates you want to compare beforehand, you should store the date differences in additional scalar fields that you can easily compare with range
queries afterwards.
来源:https://stackoverflow.com/questions/48951313/elastic-in-doc-date-comparison-issue