问题
I'm trying to annotate a queryset of Station
s with the id
of the nearest neighbouring Station
using Django 2.0.3 and PostGIS (GeoDjango) functions.
Simplified Station
model:
class Station(models.Model):
name = models.CharField(max_length=128)
location = models.PointField()
objects = StationQuerySet.as_manager()
The problem I'm having is trying to compute the closest distance, which involves annotating a subquery which refers to the location
in the outer queryset.
from django.db.models import OuterRef, Subquery
from django.contrib.gis.db.models.functions import Distance
class StationQuerySet(models.QuerySet):
def add_nearest_neighbour(self):
'''
Annotates each station with the id and distance of the nearest neighbouring station
'''
# Get Station model
Station = self.model
# Calculate distances to each station in subquery
subquery_with_distance = Station.objects.annotate(distance=Distance('location', OuterRef('location')) / 1000)
# Get nearest from subquery
nearest = subquery_with_distance.order_by('distance').values('id')[0]
return self.annotate(
nearest_station_id=Subquery(nearest)
)
The line distance = Station.objects.annotate(distance=Distance('location', OuterRef('location')) / 1000)
results in an error seen below:
from apps.bikeshare.models import Station
stations = Station.objects.add_nearest_neighbour()
Error:
Traceback (most recent call last):
File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2847, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-3-cb35ea6d5d8b>", line 1, in <module>
stations = Station.objects.add_nearest_neighbour()
File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/db/models/manager.py", line 82, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/home/gbrown/Development/transit_bikeshare/apps/bikeshare/querysets.py", line 162, in add_nearest_neighbour
subquery_with_distance = Station.objects.annotate(distance=Distance('location', OuterRef('location')) / 1000)
File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/db/models/manager.py", line 82, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/db/models/query.py", line 997, in annotate
clone.query.add_annotation(annotation, alias, is_summary=False)
File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/db/models/sql/query.py", line 975, in add_annotation
summarize=is_summary)
File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/db/models/expressions.py", line 452, in resolve_expression
c.lhs = c.lhs.resolve_expression(query, allow_joins, reuse, summarize, for_save)
File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/contrib/gis/db/models/functions.py", line 58, in resolve_expression
source_fields = res.get_source_fields()
File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/db/models/expressions.py", line 349, in get_source_fields
return [e._output_field_or_none for e in self.get_source_expressions()]
File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/db/models/expressions.py", line 349, in <listcomp>
return [e._output_field_or_none for e in self.get_source_expressions()]
AttributeError: 'ResolvedOuterRef' object has no attribute '_output_field_or_none'
回答1:
Came up with a work around using a raw query to find the nearest station and select the id AND distance from the subquery, bonus explanation below:
class StationQuerySet(models.QuerySet):
def nearest_neighbour(self):
'''
Creates a RawQuerySet of each station with the id and distance of the nearest neighbouring station
'''
# Have to execute the query in order to get the list of ids to inject
ids = tuple(self.values('id').values_list('id', flat=True))
return self.raw('''
SELECT
A0.id as id,
SUB.closest_id,
SUB.closest_distance
FROM "bikeshare_station" A0
CROSS JOIN LATERAL (
SELECT
B0.id as closest_id,
st_distance_sphere(A0.location, B0.location) as closest_distance
FROM "bikeshare_station" B0
WHERE A0.id != B0.id
ORDER BY A0.location <-> B0.location
limit 1
) SUB
WHERE A0.id IN %s;
''', [ids])
Usage
You can chain querysets calls together to filter down the queryset before finding the nearest neighbour:
query = Station.objects.filter(name='Albert Gate, Hyde Park')
closest_stations = query.nearest_neighbour()
station = closest_stations[0]
station.name
[out]: 'Albert Gate, Hyde Park'
station.closest_distance
[out]: 133.52459069
station.closest_id
[out]: 6369
SQL Explanation
This type of subquery is called a correlated subquery because it references columns in the outer query. In addition, I need to select multiple pieces of information about the nearest station (id
, distance
, etc.).
The subquery is placed in the FROM
clause, which allows multiple columns to be selected. A LATERAL
join is needed to allow the subquery to reference the sibling table in the FROM
list. With the subquery returning a single row, a CROSS
join can be applied to form a joined table based on the cartesian product rather than on a shared column.
The subquery uses the PostGIS <->
operator, which is much more efficient at ordering the table by distance between the stations, and st_distance_sphere
, to do an accurate distance calculation between the points.
来源:https://stackoverflow.com/questions/49819773/django-annotation-on-subquery