Elasticsearch - query to get latest version of records from a flattened structure

青春壹個敷衍的年華 提交于 2021-01-29 05:21:10

问题


I have a scenario where I'd like to return the latest de-normalized data from an index in Elasticsearch grouped by a certain key value - in the scenario below => TradeRef.

The below paints a better picture of data persisted in the index:

{"Row": "1", "TradeRef": "A", "TradeRefDate": "2019-01-01 13:00", "TradeRefId": "FFF", "MessageId": "XXX", "MessageStatus": "S-Open"}, 
{"Row": "2", "TradeRef": "B", "TradeRefDate": "2019-01-01 13:00", "TradeRefId": "GGG", "MessageId": "YYY", "MessageStatus": "P-Open"},
{"Row": "3", "TradeRef": "C", "TradeRefDate": "2019-01-01 13:00", "TradeRefId": "HHH", "MessageId": "ZZZ", "MessageStatus": "Q-Open"},
{"Row": "4", "TradeRef": "A", "TradeRefDate": "2019-01-01 14:00", "TradeRefId": "III", "MessageId": "AAA", "MessageStatus": "R-Open"},
{"Row": "5", "TradeRef": "B", "TradeRefDate": "2019-01-01 14:00", "TradeRefId": "JJJ", "MessageId": "BBB", "MessageStatus": "T-Open"},
{"Row": "6", "TradeRef": "A", "TradeRefDate": "2019-01-01 14:00", "TradeRefId": "III", "MessageId": "CCC", "MessageStatus": "R-Open"},
{"Row": "7", "TradeRef": "B", "TradeRefDate": "2019-01-01 14:00", "TradeRefId": "JJJ", "MessageId": "DDD", "MessageStatus": "T-Open"}

I desire my query to return the following results where rows 1 and 2 are eliminated because they reference Trade Refs 'A' & 'B' with an older TradeRefDate (2019-01-01 13:00).

More recent rows in the index contain the same TradeRef 'A' & 'B' with a more recent TradeRefDate (2019-01-01 14:00):

{"Row": "3", "TradeRef": "C", "TradeRefDate": "2019-01-01 13:00", "TradeRefId": "HHH", "MessageId": "ZZZ", "MessageStatus": "Q-Open"},
{"Row": "4", "TradeRef": "A", "TradeRefDate": "2019-01-01 14:00", "TradeRefId": "III", "MessageId": "AAA", "MessageStatus": "R-Open"},
{"Row": "5", "TradeRef": "B", "TradeRefDate": "2019-01-01 14:00", "TradeRefId": "JJJ", "MessageId": "BBB", "MessageStatus": "T-Open"},
{"Row": "6", "TradeRef": "A", "TradeRefDate": "2019-01-01 14:00", "TradeRefId": "III", "MessageId": "CCC", "MessageStatus": "R-Open"},
{"Row": "7", "TradeRef": "B", "TradeRefDate": "2019-01-01 14:00", "TradeRefId": "JJJ", "MessageId": "DDD", "MessageStatus": "T-Open"}

Any assistance will be appreciated. I have tried the below query, but it just gives me one row per TradeRef instead of the matching records associated with the latest TradeRef value:

GET /flattened_index_v1/_search
{
  "from": 0,
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "TradeRefDate": {
              "gte": "2018-09-01T00:00:00",
              "lte": "2019-10-26T00:00:00"
            }
          }
        },
        {
          "exists": {
            "field": "MessageId"
          }
        }
      ]
    }
  },
  "size": 0,
  "aggs": {
    "grp_by_trade_ref": {
      "terms": {
        "field": "TradeRef.keyword",
        "size": 1000
      },
      "aggs": {
        "latest_trecs": {
          "top_hits": {
            "size": 1,
            "sort": [
              {
                "TradeRefDate": {
                  "order": "desc"
                }
              }
            ],
            "_source": {"includes": ["TradeRef", "TradeRefId", "MessageId", "MessageStatus", "TradeRefDate"]}
          }
        }
      }
    }
  }
}

回答1:


  1. Do a term aggregation on keyword
  2. Do a term aggregation on dates under a keyword. i Select top 1 based on date in descending order ii. Return top_hits
GET index22/_search
{
  "size": 0,
  "aggs": {
    "TradeRef": {
      "terms": {
        "field": "TradeRef.keyword",
        "size": 10
      },
      "aggs": {
        "RefDate": {
          "terms": {
            "field": "TradeRefDate",
            "order": {
              "_term": "desc"
            },
            "size": 1
          },
          "aggs": {
            "TopDocuments": {
              "top_hits": {
                "size": 10
              }
            }
          }
        }
      }
    }
  }
}

EDIT 1: You can use (composite aggregation)[https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html]

In Composite aggregation you can paginate serially using after_key i.e you can fetch n records then next n records, you cannot jump from page 1 to page 3.

GET index22/_search
{
  "size": 0,
  "aggs": {
    "pagination": {
      "composite": {
       "size": 2,  ---> page_size
        "sources": [
          {
            "TradeRef": {
              "terms": {
                "field": "TradeRef.keyword"
              }
            }
          }
        ]
      },
      "aggs": {
        "RefDate": {
          "terms": {
            "field": "TradeRefDate",
            "order": {
              "_term": "desc"
            },
            "size": 1
          },
          "aggs": {
            "TopDocuments": {
              "top_hits": {
                "size": 10
              }
            }
          }
        }
      }
    }
  }
}

Response:

"aggregations" : {
    "pagination" : {
      "after_key" : {
        "TradeRef" : "B"    ----> use to fetch next set of records.
      },
      "buckets" : [
        {
          "key" : {
            "TradeRef" : "A"
          },
          "doc_count" : 3,
          "RefDate" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 1,
            "buckets" : [
              {
                "key" : 50460000,
                "key_as_string" : "1970-01-01 14:00",
                "doc_count" : 2,
                "TopDocuments" : {
                  "hits" : {
                    "total" : {
                      "value" : 2,
                      "relation" : "eq"
                    },
                    "max_score" : 1.0,
                    "hits" : [
                      {
                        "_index" : "index22",
                        "_type" : "_doc",
                        "_id" : "IkBnyG0BwSpwFwW4UeB7",
                        "_score" : 1.0,
                        "_source" : {
                          "Row" : "4",
                          "TradeRef" : "A",
                          "TradeRefDate" : "2019-01-01 14:00",
                          "TradeRefId" : "III",
                          "MessageId" : "AAA",
                          "MessageStatus" : "R-Open"
                        }
                      },
                      {
                        "_index" : "index22",
                        "_type" : "_doc",
                        "_id" : "JEBnyG0BwSpwFwW4ceBs",
                        "_score" : 1.0,
                        "_source" : {
                          "Row" : "6",
                          "TradeRef" : "A",
                          "TradeRefDate" : "2019-01-01 14:00",
                          "TradeRefId" : "III",
                          "MessageId" : "CCC",
                          "MessageStatus" : "R-Open"
                        }
                      }
                    ]
                  }
                }
              }
            ]
          }
        },
        {
          "key" : {
            "TradeRef" : "B"
          },
          "doc_count" : 3,
          "RefDate" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 1,
            "buckets" : [
              {
                "key" : 50460000,
                "key_as_string" : "1970-01-01 14:00",
                "doc_count" : 2,
                "TopDocuments" : {
                  "hits" : {
                    "total" : {
                      "value" : 2,
                      "relation" : "eq"
                    },
                    "max_score" : 1.0,
                    "hits" : [
                      {
                        "_index" : "index22",
                        "_type" : "_doc",
                        "_id" : "I0BnyG0BwSpwFwW4V-DW",
                        "_score" : 1.0,
                        "_source" : {
                          "Row" : "5",
                          "TradeRef" : "B",
                          "TradeRefDate" : "2019-01-01 14:00",
                          "TradeRefId" : "JJJ",
                          "MessageId" : "BBB",
                          "MessageStatus" : "T-Open"
                        }
                      },
                      {
                        "_index" : "index22",
                        "_type" : "_doc",
                        "_id" : "JUBnyG0BwSpwFwW4h-Cq",
                        "_score" : 1.0,
                        "_source" : {
                          "Row" : "7",
                          "TradeRef" : "B",
                          "TradeRefDate" : "2019-01-01 14:00",
                          "TradeRefId" : "JJJ",
                          "MessageId" : "DDD",
                          "MessageStatus" : "T-Open"
                        }
                      }
                    ]
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }


来源:https://stackoverflow.com/questions/58368130/elasticsearch-query-to-get-latest-version-of-records-from-a-flattened-structur

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!