Python elasticsearch DSL aggregation/metric of nested values per document

筅森魡賤 提交于 2019-12-11 02:39:57

问题


I'm trying to find the minimum (smallest) value in a 2-level nesting (separate minimum value per document).

So far I'm able to make an aggregation which counts the min value from all the nested values in my search results but without separation per document.

My example schema:

class MyExample(DocType):
    myexample_id = Integer()
    nested1 = Nested(
        properties={
            'timestamp': Date(),
            'foo': Nested(
                properties={
                    'bar': Float(),
                }
            )
        }
    )
    nested2 = Nested(
        multi=False,
        properties={
            'x': String(),
            'y': String(),
        }
    )

And this is how I'm searching and aggregating:

from elasticsearch_dsl import Search, Q

search = Search().filter(
    'nested', path='nested1', inner_hits={},
    query=Q(
        'range', **{
            'nested1.timestamp': {
                'gte': exampleDate1,
                'lte': exampleDate2
            }
        }
    )
).filter(
    'nested', path='nested2', inner_hits={'name': 'x'},
    query=Q(
        'term', **{
            'nested2.x': x
        }
    )
).filter(
    'nested', path='nested2', inner_hits={'name': 'y'},
    query=Q(
        'term', **{
            'nested2.y': y
        }
    )
)

search.aggs.bucket(
    'nested1', 'nested', path='nested1'
).bucket(
    'nested_foo', 'nested', path='nested1.foo'
).metric(
    'min_bar', 'min', field='nested1.foo.bar'
)

Basically what I need to do is to get the min value for all the nested nested1.foo.bar values for each unique MyExample (they have unique myexample_id field)


回答1:


If you want minimum value per document then put all the nested buckets within a bucket terms aggregation over myexample_id field:

search.aggs..bucket(
  'docs', 'terms', field='myexample_id'
).bucket(
  'nested1', 'nested', path='nested1'
).bucket(
  'nested_foo', 'nested', path='nested1.foo'
).metric(
  'min_bar', 'min', field='nested1.foo.bar'
)

Note that this aggregation might be extremely expensive to calculate since it has to create a bucket for each document. For a use case like this it might be easier to compute the minimum on a per document basis as a script_field or in the app.



来源:https://stackoverflow.com/questions/41594609/python-elasticsearch-dsl-aggregation-metric-of-nested-values-per-document

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!