问题
The following shows the distinct count is 2247
In [6]: VirtualMachineResources.objects.all().values('machine', 'cluster')
...: .distinct().count()
Out[6]: 2247
But then when I loop through it, it returned way more than it should:
In [4]: a = []
...: for resource in VirtualMachineResources.objects.all().values('mach
...: ine', 'cluster').distinct():
...: if resource['cluster']:
...: a.append(resource['cluster'])
...:
...:
In [5]: len(a)
Out[5]: 96953
Given the records have cluster field being empty, when I loop through the queryset, I can see a lot of repeats of the same machine without cluster value, but I would expect only once.
for resource in VirtualMachineResources.objects.all().values('machine', 'cluster').distinct():
print(resource['machine'], resource['cluster'])
prints...
server1
server1
server1
It's a postgres database. Any ideas? Having a few more digs on google and this seems related?
UPDATE: Created a Django issue here
回答1:
You have a defined VirtualMachineResources.Meta.ordering
that confuses the ORM as the ordered by columns must appear in the SELECT
clause when using DISTINCT
. The ordering happens to be cleared when using .count()
.
Long storing short, add a .order_by()
to drop your Meta.ordering
on the queryset you are iterating over and you should be good to go. There's no good way to generate a DISTINCT
that excludes ordering fields on Django's ORM yet as that would require a complex subquery pushdown as detailed in #24218.
By the way, please avoid Django's ticket tracker as a second tier support channel when you are not receiving a reply to your question as fast as you'd like.
来源:https://stackoverflow.com/questions/54354462/django-distinct-returns-more-records-than-count