Django & Postgres - percentile (median) and group by

别来无恙 提交于 2020-05-28 07:27:27

问题


I need to calculate period medians per seller ID (see simplyfied model below). The problem is I am unable to construct the ORM query.

Model

class MyModel:
    period = models.IntegerField(null=True, default=None)
    seller_ids = ArrayField(models.IntegerField(), default=list)
    aux = JSONField(default=dict)

Query

queryset = (
    MyModel.objects.filter(period=25)
    .annotate(seller_id=Func(F("seller_ids"), function="unnest"))
    .values("seller_id")
    .annotate(
        duration=Cast(KeyTextTransform("duration", "aux"), IntegerField()),
        median=Func(
            F("duration"),
            function="percentile_cont",
            template="%(function)s(0.5) WITHIN GROUP (ORDER BY %(expressions)s)",
        ),
    )
    .values("median", "seller_id")
)

ArrayField aggregation (seller_id) source


I think what I need to do is something along the lines below

select t.*, p_25, p_75
from t join
     (select district,
             percentile_cont(0.25) within group (order by sales) as p_25,
             percentile_cont(0.75) within group (order by sales) as p_75
      from t
      group by district
     ) td
     on t.district = td.district

above example source


Python 3.7.5, Django 2.2.8, Postgres 11.1


回答1:


Here's what did the trick.

from django.db.models import F, Func, IntegerField
from django.db.models.aggregates import Aggregate


queryset = (
    MyModel.objects.filter(period=25)
    .annotate(duration=Cast(KeyTextTransform("duration", "aux"), IntegerField()))
    .filter(duration__isnull=False)
    .annotate(seller_id=Func(F("seller_ids"), function="unnest"))
    .values("seller_id")  # group by
    .annotate(
        median=Aggregate(
            F("duration"),
            function="percentile_cont",
            template="%(function)s(0.5) WITHIN GROUP (ORDER BY %(expressions)s)",
        ),
    )
)

Notice the median annotation employs Aggregate and not Func as in the question. Also, order of annotate() and filter() clauses as well as order of annotate() and values() clauses matters a lot!

BTW the resulting SQL is without a nested select and join.




回答2:


You can create a Median child class of the Aggregate class as was done by Ryan Murphy (https://gist.github.com/rdmurphy/3f73c7b1826cacee34f6c2a855b12e2e). Median then works just like Avg:

    from django.db.models import Aggregate, FloatField


    class Median(Aggregate):
        function = 'PERCENTILE_CONT'
        name = 'median'
        output_field = FloatField()
        template = '%(function)s(0.5) WITHIN GROUP (ORDER BY %(expressions)s)'

Then to find the median of a field use

    my_model_aggregate = MyModel.objects.all().aggregate(Median('period'))

which is then available as my_model_aggregate['period__median'].



来源:https://stackoverflow.com/questions/59686945/django-postgres-percentile-median-and-group-by

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!