How to derive a field from two fields in an Elasticsearch index?

﹥>﹥吖頭↗ 提交于 2019-12-12 01:24:35

问题


I have an index with fields:

  • room_name
  • start_date (start time room is used)
  • end_date (end time room is used)

I am creating a curl command wherein I can get the time when a room was used.

Is it possible?

Here is current curl command:

curl -XGET "https://localhost:9200/testindex/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "aggs": {
        "room_bucket":{
            "terms": {
                "field": "room_name.keyword",
            },
            "aggs":{
                "hour_bucket": {
                    "terms": {
                        "script": {
                            "inline": "def l = doc[\"start_date \"].value;\nif ( l <= 20 && l >= 9 ) {\n  return l;\n}",
                            "lang": "painless"
                        },
                        "order": {
                            "_key": "asc"
                     },
                     "value_type": "long"
                    }
                }
            }
        }
    }
}'

Here is the result:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "testindex",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "log_version" : 1,
          "start_date" : 10,
          "end_date" : 11,
      "room_name" : "room_Y"
        }
      },
      {
        "_index" : "testindex",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "log_version" : 1,
          "start_date" : 11,
          "end_date" : 13,
          "room_name" : "room_V"
        }
      },
      {
        "_index" : "testindex",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "log_version" : 1,
          "start_date" : 10,
          "end_date" : 12,
          "room_name" : "room_Y"
        }
      }
    ]
  },
  "aggregations" : {
    "room_bucket" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "room_V",
          "doc_count" : 1,
          "hour_bucket" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : 11,
                "doc_count" : 1
              }
            ]
          }
        },
        {
          "key" : "room_Y",
          "doc_count" : 1,
          "hour_bucket" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : 10,
                "doc_count" : 1
              }
            ]
          }
        }
      ]
    }
  }
}

But my expected result in the "aggregations" is the following:

"aggregations" : {
    "room_bucket" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "room_V",
          "doc_count" : 1,
          "hour_bucket" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : 11,
                "doc_count" : 1
              },
              {
                "key" : 12,
                "doc_count" : 1
              },
              {
                "key" : 13,
                "doc_count" : 1
              }
            ]
          }
        },
        {
          "key" : "room_Y",
          "doc_count" : 1,
          "hour_bucket" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : 10,
                "doc_count" : 2
              },
              {
                "key" : 11,
                "doc_count" : 2
              },
              {
                "key" : 12,
                "doc_count" : 1
              }
            ]
          }
        }
      ]
    }
  }

In the current result, it only reads the start_date.

However, in the expected output, Room_V should have "key" = 11, "key" = 12, "key" = 13 (doc_count should be 1 for each key) because based on start_date and end_date, the room was used from 11 - 13.


回答1:


You can achieve what you want by leveraging LongStream and creating an array of all the hours in the interval, like this:

curl -XGET "https://localhost:9200/testindex/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "room_bucket": {
      "terms": {
        "field": "room_name.keyword"
      },
      "aggs": {
        "hour_bucket": {
          "terms": {
            "script": {
              "inline": """
              return LongStream.rangeClosed(doc.start_date.value, doc.end_date.value).toArray();

""",
              "lang": "painless"
            },
            "order": {
              "_key": "asc"
            },
            "value_type": "long"
          }
        }
      }
    }
  }
}'


来源:https://stackoverflow.com/questions/54195303/how-to-derive-a-field-from-two-fields-in-an-elasticsearch-index

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!