Weighted Average for nested aggregation in Elastic Search

北城余情 提交于 2019-12-25 01:37:22

问题


I am trying to obtain the weighted average by aggregating a nested list. Each document has details of a single student, and the subjects vary across each student and each subject has different weights.

I am trying to calculate the weighted average subject-wise.

My documents are of the form -

[{'class': '10th',
 'id': '1',
'subject': [{'marks': 60, 'name': 's1', 'weight': 30},
         {'marks': 80, 'name': 's2', 'weight': 70}]},
{'class': '11th',
 'id': '2',
'subject': [{'marks': 43, 'name': 's10', 'weight': 40},
         {'marks': 54, 'name': 's20', 'weight': 60}]},
{'class': '10th',
 'id': '3',
'subject': [{'marks': 43, 'name': 's1', 'weight': 20},
         {'marks': 54, 'name': 's20', 'weight': 80}]},
{'class': '10th',
 'id': '4',
'subject': [{'marks': 69, 'name': 's10', 'weight': 30},
         {'marks': 45, 'name': 's2', 'weight': 70}]}]

Here s1,s10,s2,s20 are the subjects. For a given class, say "10th" I am trying to aggregate the weighted average.

The query I make is

GET students_try/_search
{
 "query": {
"match": {
  "class": "10th"
}
 },
"aggs": {
"subjects": {
  "nested": {
    "path": "subject"
  },
  "aggs": {
    "subjects": {
      "terms": {
        "field": "subject.name"
      },
      "aggs": {
        "avg_score": {
          "avg": {
            "field": "subject.marks"
          }
        },
        "weighted_grade": {
          "weighted_avg": {
            "value": {
              "field": "subject.marks"
            },
            "weight": {
              "field": "subject.weight"
            }
          }
        }
      }
    }
  }
}
  },
 "size": 0
}

The error I get is -

{u'error': {u'col': 211,
        u'line': 1,
        u'reason': u'Unknown BaseAggregationBuilder [weighted_avg]',
        u'root_cause': [{u'col': 211,
                         u'line': 1,
                         u'reason': u'Unknown BaseAggregationBuilder [weighted_avg]',
                         u'type': u'unknown_named_object_exception'}],
        u'type': u'unknown_named_object_exception'},
 u'status': 400}

I am not sure what is causing the error.


回答1:


Yes the weighted average as mentioned by Nishant only appears post 6.4 as mentioned in the section A few others in this link detailing 6.4 release

However I've come up with the below script using Bucket Script Aggregation which calculates the weighted avg for each and every bucket :

POST <your_index_name>/_search
{
  "size": 0,
  "query": {
    "match": {
      "class": "10th"
    }
  },
  "aggs": {
    "subjects": {
      "nested": {
        "path": "subject"
      },
      "aggs": {
        "subjects": {
          "terms": {
            "field": "subject.name.keyword"
          },
          "aggs": {
            "avg_score": {
              "avg": {
                "field": "subject.marks"
              }
            },
            "sum_score":{
              "sum_productOfMarksAndWeight": {
                "script": "doc['subject.marks'].value * doc['subject.weight'].value"
              }
            },
            "sum_weights": {
              "sum": {
                "field": "subject.weight"
              }
            },
            "weighted_avg":{
              "bucket_script": {
                "buckets_path": {
                  "sumScore": "sum_productOfMarksAndWeight",
                  "sumWeights": "sum_weights"
                },
                "script": "params.sumScore/params.sumWeights"
              }
            }
          }             
        }
      }
    }
  }
}

If you look at the above aggregation carefully, for every bucket I've calculated the sum of weights and sum of product of weights and marks using Sum Aggregation and then I've used these two aggregations to calculate the weighted aggregation.

Below is how your response appears. Notice that there is a caveat that you'd also see the sum of weights and sum of product of weights and marks in the aggregation result.

Response

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "subjects": {
      "doc_count": 6,
      "subjects": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "s1",
            "doc_count": 2,
            "sum_weights": {
              "value": 50
            },
            "sum_productOfMarksAndWeight": {
              "value": 2660
            },
            "avg_score": {
              "value": 51.5
            },
            "weighted_avg": {
              "value": 53.2
            }
          },
          {
            "key": "s2",
            "doc_count": 2,
            "sum_weights": {
              "value": 140
            },
            "sum_productOfMarksAndWeight": {
              "value": 8750
            },
            "avg_score": {
              "value": 62.5
            },
            "weighted_avg": {
              "value": 62.5
            }
          },
          {
            "key": "s10",
            "doc_count": 1,
            "sum_weights": {
              "value": 30
            },
            "sum_productOfMarksAndWeight": {
              "value": 2070
            },
            "avg_score": {
              "value": 69
            },
            "weighted_avg": {
              "value": 69
            }
          },
          {
            "key": "s20",
            "doc_count": 1,
            "sum_weights": {
              "value": 80
            },
            "sum_productOfMarksAndWeight": {
              "value": 4320
            },
            "avg_score": {
              "value": 54
            },
            "weighted_avg": {
              "value": 54
            }
          }
        ]
      }
    }
  }
}

I hope this helps, let me know if it doesn't and if you think this solves what you are looking for, please go ahead and accept this answer ;-)



来源:https://stackoverflow.com/questions/53498595/weighted-average-for-nested-aggregation-in-elastic-search

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!