Django Annotated Query to Count all entities used in a Reverse Relationship

限于喜欢 提交于 2020-03-25 18:49:26

问题


This question is a follow up question for this SO question : Django Annotated Query to Count Only Latest from Reverse Relationship

Given these models:

class Candidate(BaseModel):
    name = models.CharField(max_length=128)

class Status(BaseModel):
    name = models.CharField(max_length=128)

class StatusChange(BaseModel):
    candidate = models.ForeignKey("Candidate", related_name="status_changes")
    status = models.ForeignKey("Status", related_name="status_changes")
    created_at = models.DateTimeField(auto_now_add=True, blank=True)

Represented by these tables:

candidates
+----+--------------+
| id | name         |
+----+--------------+
|  1 | Beth         |
|  2 | Mark         |
|  3 | Mike         |
|  4 | Ryan         |
+----+--------------+

status
+----+--------------+
| id | name         |
+----+--------------+
|  1 | Review       |
|  2 | Accepted     |
|  3 | Rejected     |
+----+--------------+

status_change
+----+--------------+-----------+------------+
| id | candidate_id | status_id | created_at |
+----+--------------+-----------+------------+
|  1 | 1            | 1         | 03-01-2019 |
|  2 | 1            | 2         | 05-01-2019 |
|  4 | 2            | 1         | 01-01-2019 |
|  5 | 3            | 1         | 01-01-2019 |
|  6 | 4            | 3         | 01-01-2019 |
+----+--------------+-----------+------------+

I wanted to get a count of each status type, but only include the last status for each candidate:

last_status_count
+-----------+-------------+--------+
| status_id | status_name | count  |
+-----------+-------------+--------+
| 1         | Review      | 2      | 
| 2         | Accepted    | 1      | 
| 3         | Rejected    | 1      |
+-----------+-------------+--------+

I was able to achieve this with this answer:

from django.db.models import Count, F, Max

Status.objects.filter(
    status_changes__in=StatusChange.objects.annotate(
        last=Max('candidate__status_changes__created_at')
    ).filter(
        created_at=F('last')
    )
).annotate(
    nlast=Count('status_changes')
)

>>> [(q.name, q.nlast) for q in qs]
[('Review', 2), ('Accepted', 1), ('Rejected', 1)]

The issue however, is if there is a status not reference by any status change, it's omitted from the result. Instead, I would like to count it as zero. For example, if the status were

+----+--------------+
| id | name         |
+----+--------------+
|  1 | Review       |
|  2 | Accepted     |
|  3 | Rejected     |
|  4 | Banned       |
+----+--------------+

I would get:

+-----------+-------------+--------+
| status_id | status_name | count  |
+-----------+-------------+--------+
| 1         | Review      | 2      | 
| 2         | Accepted    | 1      | 
| 3         | Rejected    | 1      |
| 4         | Banned      | 0      |
+-----------+-------------+--------+

>>> [(q.name, q.nlast) for q in qs]
[('Review', 2), ('Accepted', 1), ('Rejected', 1), ('Accepted 0)]

What I tried

I solved this by doing an outer join in SQL but I am not sure how to achieve that in Djano. I tried creating a queryset with all counts annotated as zero and the merging it, but it did not work:

last_status_changes = Status.objects.filter(
    status_changes__in=StatusChange.objects.annotate(
        last=Max('candidate__status_changes__created_at')
    ).filter(
        created_at=F('last')
    )
).annotate(
    nlast=Count('status_changes')
)
zero_query = (
    Status.objects.all()
    .annotate(nlast=Value(0, output_field=IntegerField()))
    .exclude(pk__in=last_status_changes.values("id"))
)

>>> qs = last_status_changes | zero_query
>>> [(q.name, q.nlast) for q in qs]
[('Review', 3), ('Accepted', 1), ('Rejected', 1)]
# this would double count "Review" and include not only last but others

Any help is appreciated Thanks

Update 1

I was able to solve this with a Raw Query using a right join, but would be great to do this using the ORM

# Untested as I am using different model names in reality
SQL = """SELECT
        Min(status.id) as id
        , COUNT(latest_status_change.candidate_id) as status_count
    FROM
        (
        SELECT
            candidate_id,
            Max(created_at) AS latest_date
        FROM
            api_status_change
        GROUP BY candidate_id
        )
    AS latest_status_change
    INNER JOIN api_candidates ON (latest_status_change.candidate_id = api_candidates.id)
    INNER JOIN api_status_change ON 
        (
            latest_status_change.candidate_id = api_candidates.id 
            AND 
            latest_status_change.latest_date = api_status_change.created_at
        )
    RIGHT JOIN api_status AS status  ON (api_status_change.status_id = `status`.id)
    GROUP BY status.name
    ;
"""
qs = Status.objects.raw(SQL)
>>> [(q.name, q.nlast) for q in qs]
[('Review', 2), ('Accepted', 1), ('Rejected', 1), ('Accepted 0)]


回答1:


The only one problem here is that you are filtering your State queryset by existing status changes and expecting complete opposite results. In your case the solution is to get rid of obsolete filtering

last_status_changes = Status.objects.annotate(
    nlast=Count('status_changes')
).order_by(
    '-nlast'
)

The other case would be if you want really filter you changes (by date for example)

changed_status_ids = Status.objects.filter(
    status_changes__created_at__gte='2020-03-03'
).values_list(
    'id',
    flat=True
)

Status.objects.annotate(
    c=Count('status_changes')
).annotate(
    cnt=Case(
        When(
            id__in=changed_status_ids,
            then=F('c')
        ),
        output_field=models.IntegerField(),
        default=0
    )
).values(
    'cnt',
    'name'
).order_by(
    '-cnt'
)




回答2:


I solved it with the queryset below:

qs_last_status_changes = StatusChanges.objects
    .annotate(
        _last_change=models.Max("candidate__status_changes__create_at")
    ).filter(created_at=models.F("_last_change")

qs_status = Status.objects\
    .annotate(count=models.Sum(
        models.Case(
            models.When(
                status_changes__in=qs_last_status_changes, 
                then=models.Value(1)
            ),
            output_field=models.IntegerField(),
            default=0,
        )
    )
)
>>> [(k.name, k.count) for k in qs_status]
[('Review', 2), ('Accepted', 1), ('Rejected', 1), ('Accepted 0)]

Thank you Andrey Nelubin for your suggestion



来源:https://stackoverflow.com/questions/60627961/django-annotated-query-to-count-all-entities-used-in-a-reverse-relationship

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!