问题
This question is a follow up question for this SO question : Django Annotated Query to Count Only Latest from Reverse Relationship
Given these models:
class Candidate(BaseModel):
name = models.CharField(max_length=128)
class Status(BaseModel):
name = models.CharField(max_length=128)
class StatusChange(BaseModel):
candidate = models.ForeignKey("Candidate", related_name="status_changes")
status = models.ForeignKey("Status", related_name="status_changes")
created_at = models.DateTimeField(auto_now_add=True, blank=True)
Represented by these tables:
candidates
+----+--------------+
| id | name |
+----+--------------+
| 1 | Beth |
| 2 | Mark |
| 3 | Mike |
| 4 | Ryan |
+----+--------------+
status
+----+--------------+
| id | name |
+----+--------------+
| 1 | Review |
| 2 | Accepted |
| 3 | Rejected |
+----+--------------+
status_change
+----+--------------+-----------+------------+
| id | candidate_id | status_id | created_at |
+----+--------------+-----------+------------+
| 1 | 1 | 1 | 03-01-2019 |
| 2 | 1 | 2 | 05-01-2019 |
| 4 | 2 | 1 | 01-01-2019 |
| 5 | 3 | 1 | 01-01-2019 |
| 6 | 4 | 3 | 01-01-2019 |
+----+--------------+-----------+------------+
I wanted to get a count of each status type, but only include the last status for each candidate:
last_status_count
+-----------+-------------+--------+
| status_id | status_name | count |
+-----------+-------------+--------+
| 1 | Review | 2 |
| 2 | Accepted | 1 |
| 3 | Rejected | 1 |
+-----------+-------------+--------+
I was able to achieve this with this answer:
from django.db.models import Count, F, Max
Status.objects.filter(
status_changes__in=StatusChange.objects.annotate(
last=Max('candidate__status_changes__created_at')
).filter(
created_at=F('last')
)
).annotate(
nlast=Count('status_changes')
)
>>> [(q.name, q.nlast) for q in qs]
[('Review', 2), ('Accepted', 1), ('Rejected', 1)]
The issue however, is if there is a status not reference by any status change, it's omitted from the result. Instead, I would like to count it as zero. For example, if the status were
+----+--------------+
| id | name |
+----+--------------+
| 1 | Review |
| 2 | Accepted |
| 3 | Rejected |
| 4 | Banned |
+----+--------------+
I would get:
+-----------+-------------+--------+
| status_id | status_name | count |
+-----------+-------------+--------+
| 1 | Review | 2 |
| 2 | Accepted | 1 |
| 3 | Rejected | 1 |
| 4 | Banned | 0 |
+-----------+-------------+--------+
>>> [(q.name, q.nlast) for q in qs]
[('Review', 2), ('Accepted', 1), ('Rejected', 1), ('Accepted 0)]
What I tried
I solved this by doing an outer join in SQL but I am not sure how to achieve that in Djano. I tried creating a queryset with all counts annotated as zero and the merging it, but it did not work:
last_status_changes = Status.objects.filter(
status_changes__in=StatusChange.objects.annotate(
last=Max('candidate__status_changes__created_at')
).filter(
created_at=F('last')
)
).annotate(
nlast=Count('status_changes')
)
zero_query = (
Status.objects.all()
.annotate(nlast=Value(0, output_field=IntegerField()))
.exclude(pk__in=last_status_changes.values("id"))
)
>>> qs = last_status_changes | zero_query
>>> [(q.name, q.nlast) for q in qs]
[('Review', 3), ('Accepted', 1), ('Rejected', 1)]
# this would double count "Review" and include not only last but others
Any help is appreciated Thanks
Update 1
I was able to solve this with a Raw Query using a right join, but would be great to do this using the ORM
# Untested as I am using different model names in reality
SQL = """SELECT
Min(status.id) as id
, COUNT(latest_status_change.candidate_id) as status_count
FROM
(
SELECT
candidate_id,
Max(created_at) AS latest_date
FROM
api_status_change
GROUP BY candidate_id
)
AS latest_status_change
INNER JOIN api_candidates ON (latest_status_change.candidate_id = api_candidates.id)
INNER JOIN api_status_change ON
(
latest_status_change.candidate_id = api_candidates.id
AND
latest_status_change.latest_date = api_status_change.created_at
)
RIGHT JOIN api_status AS status ON (api_status_change.status_id = `status`.id)
GROUP BY status.name
;
"""
qs = Status.objects.raw(SQL)
>>> [(q.name, q.nlast) for q in qs]
[('Review', 2), ('Accepted', 1), ('Rejected', 1), ('Accepted 0)]
回答1:
The only one problem here is that you are filtering your State
queryset by existing status changes and expecting complete opposite results. In your case the solution is to get rid of obsolete filtering
last_status_changes = Status.objects.annotate(
nlast=Count('status_changes')
).order_by(
'-nlast'
)
The other case would be if you want really filter you changes (by date for example)
changed_status_ids = Status.objects.filter(
status_changes__created_at__gte='2020-03-03'
).values_list(
'id',
flat=True
)
Status.objects.annotate(
c=Count('status_changes')
).annotate(
cnt=Case(
When(
id__in=changed_status_ids,
then=F('c')
),
output_field=models.IntegerField(),
default=0
)
).values(
'cnt',
'name'
).order_by(
'-cnt'
)
回答2:
I solved it with the queryset below:
qs_last_status_changes = StatusChanges.objects
.annotate(
_last_change=models.Max("candidate__status_changes__create_at")
).filter(created_at=models.F("_last_change")
qs_status = Status.objects\
.annotate(count=models.Sum(
models.Case(
models.When(
status_changes__in=qs_last_status_changes,
then=models.Value(1)
),
output_field=models.IntegerField(),
default=0,
)
)
)
>>> [(k.name, k.count) for k in qs_status]
[('Review', 2), ('Accepted', 1), ('Rejected', 1), ('Accepted 0)]
Thank you Andrey Nelubin for your suggestion
来源:https://stackoverflow.com/questions/60627961/django-annotated-query-to-count-all-entities-used-in-a-reverse-relationship