I am new to ES and i have some specific requirement, my document look like this
{
\"_index\" : \"bidder_server_stats\",
\"_type\" : \"doc\",
\"_id\"
You can achieve this with sub-aggregations. Starting from ES 6.1, composite aggregation could also come handy (although it is still experimental).
The query might look like this:
POST bidder_server_stats/doc/_search
{
"size": 0,
"aggs": {
"by bc": {
"terms": {
"field": "bc"
},
"aggs": {
"by log_dt_st": {
"terms": {
"field": "log_dt_st"
},
"aggs": {
"sum(avg_price)": {
"sum": {
"field": "avg_price"
}
}
}
}
}
}
}
}
And the response would look like this:
{
...
"aggregations": {
"by bc": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "2513",
"doc_count": 2,
"by log_dt_st": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1528439776073,
"key_as_string": "2018-06-08T06:36:16.073Z",
"doc_count": 2,
"sum(avg_price)": {
"value": 16.800000190734863
}
}
]
}
},
{
"key": "2514",
"doc_count": 1,
"by log_dt_st": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1528439776073,
"key_as_string": "2018-06-08T06:36:16.073Z",
"doc_count": 1,
"sum(avg_price)": {
"value": 10
}
}
]
}
}
]
}
}
}
Few moments to consider:
bc
should have keyword
type (to be able to do terms aggregation on it)terms
aggregation only returns top-10 buckets by default; you may be interested in size
and sort
options of this aggregationUpdate: Responding to the questions from the comments since it will improve the answer.
No, not directly. Like in a SQL GROUP BY
, all fields returned should be either part of GROUP BY
or aggregate functions.
There are few options to actually get more data alongside the aggregations:
hits
part);I cannot find any relevant documentation or a configuration setting that would allow to have a sure answer. However, there is index.max_docvalue_fields_search
setting that defaults to 100
in Dynamic index settings. Since aggregations use doc_values, I'd say that around 100 bucket aggregations is a reasonable upper limit.
I believe that the limitation here is the actual performance of your Elasticsearch cluster.
It can be done, but might be not efficient. You may use script mode of terms
aggregation. The query might look like this:
POST bidder_server_stats/doc/_search
{
"size": 0,
"aggs": {
"via script": {
"terms": {
"script": {
"source": "doc['bc'].value +':::'+ doc['log_dt_st'].value ",
"lang": "painless"
}
},
"aggs": {
"sum(avg_price)": {
"sum": {
"field": "avg_price"
}
}
}
}
}
}
And the result will look like the following:
{
...
"aggregations": {
"via script": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "2513:::2018-06-08T06:36:16.073Z",
"doc_count": 2,
"sum(avg_price)": {
"value": 16.800000190734863
}
},
{
"key": "2514:::2018-06-08T06:36:16.073Z",
"doc_count": 1,
"sum(avg_price)": {
"value": 10
}
}
]
}
}
}
In order to perform this aggregation Elasticsearch will have to compute the bucket value for each document matching the query, which is an equivalent of the full scan in SQL. Aggregations instead are more like index look-ups, since they use doc_values
data representation, a data structure that makes these look-ups efficient.
In some cases script
buckets can be a solution, but their scope is quite limited. In case you are interested in script
-based solution there is also scripted metric aggregation to consider.
Hope that helps!
composite
aggregationIn Elasticsearch 6.1 composite aggregation was added. As of 6.3, it is still marked as experimental (so the API might change, or this feature could be removed completely in the future).
The query in this case would look like:
POST bidder_server_stats/doc/_search
{
"size": 0,
"aggs": {
"my composite": {
"composite": {
"sources": [
{
"bc": {
"terms": {
"field": "bc"
}
}
},
{
"log_dt_st": {
"terms": {
"field": "log_dt_st"
}
}
}
]
},
"aggs": {
"sum(avg_price)": {
"sum": {
"field": "avg_price"
}
}
}
}
}
}
And the response:
{
"aggregations": {
"my composite": {
"after_key": {
"bc": "2514",
"log_dt_st": 1528439776073
},
"buckets": [
{
"key": {
"bc": "2513",
"log_dt_st": 1528439776073
},
"doc_count": 2,
"sum(avg_price)": {
"value": 16.800000190734863
}
},
{
"key": {
"bc": "2514",
"log_dt_st": 1528439776073
},
"doc_count": 1,
"sum(avg_price)": {
"value": 10
}
}
]
}
}
}