问题
I have a scenario where I'd like to return the latest de-normalized data from an index in Elasticsearch grouped by a certain key value - in the scenario below => TradeRef.
The below paints a better picture of data persisted in the index:
{"Row": "1", "TradeRef": "A", "TradeRefDate": "2019-01-01 13:00", "TradeRefId": "FFF", "MessageId": "XXX", "MessageStatus": "S-Open"},
{"Row": "2", "TradeRef": "B", "TradeRefDate": "2019-01-01 13:00", "TradeRefId": "GGG", "MessageId": "YYY", "MessageStatus": "P-Open"},
{"Row": "3", "TradeRef": "C", "TradeRefDate": "2019-01-01 13:00", "TradeRefId": "HHH", "MessageId": "ZZZ", "MessageStatus": "Q-Open"},
{"Row": "4", "TradeRef": "A", "TradeRefDate": "2019-01-01 14:00", "TradeRefId": "III", "MessageId": "AAA", "MessageStatus": "R-Open"},
{"Row": "5", "TradeRef": "B", "TradeRefDate": "2019-01-01 14:00", "TradeRefId": "JJJ", "MessageId": "BBB", "MessageStatus": "T-Open"},
{"Row": "6", "TradeRef": "A", "TradeRefDate": "2019-01-01 14:00", "TradeRefId": "III", "MessageId": "CCC", "MessageStatus": "R-Open"},
{"Row": "7", "TradeRef": "B", "TradeRefDate": "2019-01-01 14:00", "TradeRefId": "JJJ", "MessageId": "DDD", "MessageStatus": "T-Open"}
I desire my query to return the following results where rows 1 and 2 are eliminated because they reference Trade Refs 'A' & 'B' with an older TradeRefDate (2019-01-01 13:00).
More recent rows in the index contain the same TradeRef 'A' & 'B' with a more recent TradeRefDate (2019-01-01 14:00):
{"Row": "3", "TradeRef": "C", "TradeRefDate": "2019-01-01 13:00", "TradeRefId": "HHH", "MessageId": "ZZZ", "MessageStatus": "Q-Open"},
{"Row": "4", "TradeRef": "A", "TradeRefDate": "2019-01-01 14:00", "TradeRefId": "III", "MessageId": "AAA", "MessageStatus": "R-Open"},
{"Row": "5", "TradeRef": "B", "TradeRefDate": "2019-01-01 14:00", "TradeRefId": "JJJ", "MessageId": "BBB", "MessageStatus": "T-Open"},
{"Row": "6", "TradeRef": "A", "TradeRefDate": "2019-01-01 14:00", "TradeRefId": "III", "MessageId": "CCC", "MessageStatus": "R-Open"},
{"Row": "7", "TradeRef": "B", "TradeRefDate": "2019-01-01 14:00", "TradeRefId": "JJJ", "MessageId": "DDD", "MessageStatus": "T-Open"}
Any assistance will be appreciated. I have tried the below query, but it just gives me one row per TradeRef instead of the matching records associated with the latest TradeRef value:
GET /flattened_index_v1/_search
{
"from": 0,
"query": {
"bool": {
"must": [
{
"range": {
"TradeRefDate": {
"gte": "2018-09-01T00:00:00",
"lte": "2019-10-26T00:00:00"
}
}
},
{
"exists": {
"field": "MessageId"
}
}
]
}
},
"size": 0,
"aggs": {
"grp_by_trade_ref": {
"terms": {
"field": "TradeRef.keyword",
"size": 1000
},
"aggs": {
"latest_trecs": {
"top_hits": {
"size": 1,
"sort": [
{
"TradeRefDate": {
"order": "desc"
}
}
],
"_source": {"includes": ["TradeRef", "TradeRefId", "MessageId", "MessageStatus", "TradeRefDate"]}
}
}
}
}
}
}
回答1:
- Do a term aggregation on keyword
- Do a term aggregation on dates under a keyword. i Select top 1 based on date in descending order ii. Return top_hits
GET index22/_search
{
"size": 0,
"aggs": {
"TradeRef": {
"terms": {
"field": "TradeRef.keyword",
"size": 10
},
"aggs": {
"RefDate": {
"terms": {
"field": "TradeRefDate",
"order": {
"_term": "desc"
},
"size": 1
},
"aggs": {
"TopDocuments": {
"top_hits": {
"size": 10
}
}
}
}
}
}
}
}
EDIT 1: You can use (composite aggregation)[https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html]
In Composite aggregation you can paginate serially using after_key i.e you can fetch n records then next n records, you cannot jump from page 1 to page 3.
GET index22/_search
{
"size": 0,
"aggs": {
"pagination": {
"composite": {
"size": 2, ---> page_size
"sources": [
{
"TradeRef": {
"terms": {
"field": "TradeRef.keyword"
}
}
}
]
},
"aggs": {
"RefDate": {
"terms": {
"field": "TradeRefDate",
"order": {
"_term": "desc"
},
"size": 1
},
"aggs": {
"TopDocuments": {
"top_hits": {
"size": 10
}
}
}
}
}
}
}
}
Response:
"aggregations" : {
"pagination" : {
"after_key" : {
"TradeRef" : "B" ----> use to fetch next set of records.
},
"buckets" : [
{
"key" : {
"TradeRef" : "A"
},
"doc_count" : 3,
"RefDate" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 1,
"buckets" : [
{
"key" : 50460000,
"key_as_string" : "1970-01-01 14:00",
"doc_count" : 2,
"TopDocuments" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "index22",
"_type" : "_doc",
"_id" : "IkBnyG0BwSpwFwW4UeB7",
"_score" : 1.0,
"_source" : {
"Row" : "4",
"TradeRef" : "A",
"TradeRefDate" : "2019-01-01 14:00",
"TradeRefId" : "III",
"MessageId" : "AAA",
"MessageStatus" : "R-Open"
}
},
{
"_index" : "index22",
"_type" : "_doc",
"_id" : "JEBnyG0BwSpwFwW4ceBs",
"_score" : 1.0,
"_source" : {
"Row" : "6",
"TradeRef" : "A",
"TradeRefDate" : "2019-01-01 14:00",
"TradeRefId" : "III",
"MessageId" : "CCC",
"MessageStatus" : "R-Open"
}
}
]
}
}
}
]
}
},
{
"key" : {
"TradeRef" : "B"
},
"doc_count" : 3,
"RefDate" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 1,
"buckets" : [
{
"key" : 50460000,
"key_as_string" : "1970-01-01 14:00",
"doc_count" : 2,
"TopDocuments" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "index22",
"_type" : "_doc",
"_id" : "I0BnyG0BwSpwFwW4V-DW",
"_score" : 1.0,
"_source" : {
"Row" : "5",
"TradeRef" : "B",
"TradeRefDate" : "2019-01-01 14:00",
"TradeRefId" : "JJJ",
"MessageId" : "BBB",
"MessageStatus" : "T-Open"
}
},
{
"_index" : "index22",
"_type" : "_doc",
"_id" : "JUBnyG0BwSpwFwW4h-Cq",
"_score" : 1.0,
"_source" : {
"Row" : "7",
"TradeRef" : "B",
"TradeRefDate" : "2019-01-01 14:00",
"TradeRefId" : "JJJ",
"MessageId" : "DDD",
"MessageStatus" : "T-Open"
}
}
]
}
}
}
]
}
}
]
}
}
来源:https://stackoverflow.com/questions/58368130/elasticsearch-query-to-get-latest-version-of-records-from-a-flattened-structur