问题
Why is this not returning a count of number of points in each neighbourhoods (bounding box)?
import geopandas as gpd
def radius(points_neighbour, points_center, new_field_name, r):
"""
:param points_neighbour:
:param points_center:
:param new_field_name: new field_name attached to points_center
:param r: radius around points_center
:return:
"""
sindex = points_neighbour.sindex
pts_in_neighbour = []
for i, pt_center in points_center.iterrows():
nearest_index = list(sindex.intersection((pt_center.LATITUDE-r, pt_center.LONGITUDE-r, pt_center.LATITUDE+r, pt_center.LONGITUDE+r)))
pts_in_this_neighbour = points_neighbour[nearest_index]
pts_in_neighbour.append(len(pts_in_this_neighbour))
points_center[new_field_name] = gpd.GeoSeries(pts_in_neighbour)
Every loop gives the same result.
Second question, how can I find k-th nearest neighbour?
More information about the problem itself:
We are doing it at a very small scale e.g. Washington State, US or British Columbia, Canada
We hope to utilize geopandas as much as possible since it's similar to pandas and supports spatial indexing: RTree
For example, sindex here has method nearest, intersection, etc.
Please comment if you need more information. This is the code in class GeoPandasBase
@property
def sindex(self):
if not self._sindex_generated:
self._generate_sindex()
return self._sindex
I tried Richard's example but it didn't work
def radius(points_neighbour, points_center, new_field_name, r):
"""
:param points_neighbour:
:param points_center:
:param new_field_name: new field_name attached to points_center
:param r: radius around points_center
:return:
"""
sindex = points_neighbour.sindex
pts_in_neighbour = []
for i, pt_center in points_center.iterrows():
pts_in_this_neighbour = 0
for n in sindex.intersection(((pt_center.LATITUDE-r, pt_center.LONGITUDE-r, pt_center.LATITUDE+r, pt_center.LONGITUDE+r))):
dist = pt_center.distance(points_neighbour['geometry'][n])
if dist < radius:
pts_in_this_neighbour = pts_in_this_neighbour + 1
pts_in_neighbour.append(pts_in_this_neighbour)
points_center[new_field_name] = gpd.GeoSeries(pts_in_neighbour)
To download the shape file, goto https://catalogue.data.gov.bc.ca/dataset/hellobc-activities-and-attractions-listing and choose ArcView to download
回答1:
Rather than answer your question directly, I'd argue that you're doing this wrong. After arguing this, I'll give a better answer.
Why you're doing it wrong
An r-tree is great for bounding-box queries in two or three Euclidean dimensions.
You are looking up longitude-latitude points on a two-dimensional surface curved in a three-dimensional space. The upshot is that your coordinate system will yield singularities and discontinuities: 180°W is the same as 180°E, 2°E by 90°N is close to 2°W by 90°N. The r-tree does not capture these sorts of things!
But, even if they were a good solution, your idea to take lat±r and lon±r yields a square region; rather, you probably want a circular region around your point.
How to do it right
Rather than keeping the points in lon-lat format, convert them to xyz format using a spherical coordinate conversion. Now they are in a 3D Euclidean space and there are no singularities or discontinuities.
Place the points in a three-dimensional kd-tree. This allows you to quickly, in O(log n) time, ask questions like "What are the k-nearest neighbours to this point?" and "What are all the points within a radius r of this points?" SciPy comes with an implementation.
For your radius search, convert from a Great Circle radius to a chord: this makes the search in 3-space equivalent to a radius search on a circle wrapped to the surface of a sphere (in this case, the Earth).
Code for doing it right
I've implemented the foregoing in Python as a demonstration. Note that all spherical points are stored in (longitude,latitude)/(x-y) format using a lon=[-180,180], lat=[-90,90] scheme. All 3D points are stored in (x,y,z) format.
#/usr/bin/env python3
import numpy as np
import scipy as sp
import scipy.spatial
Rearth = 6371
#Generate uniformly-distributed lon-lat points on a sphere
#See: http://mathworld.wolfram.com/SpherePointPicking.html
def GenerateUniformSpherical(num):
#Generate random variates
pts = np.random.uniform(low=0, high=1, size=(num,2))
#Convert to sphere space
pts[:,0] = 2*np.pi*pts[:,0] #0-360 degrees
pts[:,1] = np.arccos(2*pts[:,1]-1) #0-180 degrees
#Convert to degrees
pts = np.degrees(pts)
#Shift ranges to lon-lat
pts[:,0] -= 180
pts[:,1] -= 90
return pts
def ConvertToXYZ(lonlat):
theta = np.radians(lonlat[:,0])+np.pi
phi = np.radians(lonlat[:,1])+np.pi/2
x = Rearth*np.cos(theta)*np.sin(phi)
y = Rearth*np.sin(theta)*np.sin(phi)
z = Rearth*np.cos(phi)
return np.transpose(np.vstack((x,y,z)))
#Get all points which lie with `r_km` Great Circle kilometres of the query
#points `qpts`.
def GetNeighboursWithinR(qpts,kdtree,r_km):
#We need to convert Great Circle kilometres into chord length kilometres in
#order to use the kd-tree
#See: http://mathworld.wolfram.com/CircularSegment.html
angle = r_km/Rearth
chord_length = 2*Rearth*np.sin(angle/2)
pts3d = ConvertToXYZ(qpts)
#See: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.KDTree.query_ball_point.html#scipy.spatial.KDTree.query_ball_point
#p=2 implies Euclidean distance, eps=0 implies no approximation (slower)
return kdtree.query_ball_point(pts3d,chord_length,p=2,eps=0)
##############################################################################
#WARNING! Do NOT alter pts3d or kdtree will malfunction and need to be rebuilt
##############################################################################
##############################
#Correctness tests on the North, South, East, and West poles, along with Kolkata
ptsll = np.array([[0,90],[0,-90],[0,0],[-180,0],[88.3639,22.5726]])
pts3d = ConvertToXYZ(ptsll)
kdtree = sp.spatial.KDTree(pts3d, leafsize=10) #Stick points in kd-tree for fast look-up
qptsll = np.array([[-3,88],[5,-85],[10,10],[-178,3],[175,4]])
GetNeighboursWithinR(qptsll, kdtree, 2000)
##############################
#Stress tests
ptsll = GenerateUniformSpherical(100000) #Generate uniformly-distributed lon-lat points on a sphere
pts3d = ConvertToXYZ(ptsll) #Convert points to 3d
#See: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.KDTree.html
kdtree = sp.spatial.KDTree(pts3d, leafsize=10) #Stick points in kd-tree for fast look-up
qptsll = GenerateUniformSpherical(100) #We'll find neighbours near these points
GetNeighboursWithinR(qptsll, kdtree, 500)
回答2:
I've attached code which should, with some minor modifications, do what you want.
I think your problem arose for one of two reasons:
You were not correctly constructing the spatial index. Your responses to my comments suggested that you weren't wholly aware of how the spatial index was getting made.
The bounding box for your spatial query was not built correctly.
I'll discuss both possibilities below.
Constructing the spatial index
As it turns out, the spatial index is constructed simply by typing:
sindex = gpd_df.sindex
Magic.
But from whence does gpd_df.sindex
get its data? It assumes that the data is stored in a column called geometry
in a shapely
format. If you have not added data to such a column, it will raise a warning.
A correct initialization of the data frame would look like so:
#Generate random points throughout Oregon
x = np.random.uniform(low=oregon_xmin, high=oregon_xmax, size=10000)
y = np.random.uniform(low=oregon_ymin, high=oregon_ymax, size=10000)
#Turn the lat-long points into a geodataframe
gpd_df = gpd.GeoDataFrame(data={'x':x, 'y':y})
#Set up point geometries so that we can index the data frame
#Note that I am using x-y points!
gpd_df['geometry'] = gpd_df.apply(lambda row: shapely.geometry.Point((row['x'], row['y'])), axis=1)
#Automagically constructs a spatial index from the `geometry` column
gpd_df.sindex
Seeing the foregoing sort of example code in your question would have been helpful in diagnosing your problem and getting going on solving it.
Since you did not get the extremely obvious warning geopandas
raises when a geometry column is missing:
AttributeError: No geometry data set yet (expected in column 'geometry'.
I think you've probably done this part right.
Constructing the bounding box
In your question, you form a bounding box like so:
nearest_index = list(sindex.intersection((pt_center.LATITUDE-r, pt_center.LONGITUDE-r, pt_center.LATITUDE+r, pt_center.LONGITUDE+r)))
As it turns out, bounding boxes have the form:
(West, South, East, North)
At least, they do for X-Y styled-points, e.g. shapely.geometry.Point(Lon,Lat)
In my code, I use the following:
bbox = (cpt.x-radius, cpt.y-radius, cpt.x+radius, cpt.y+radius)
Working example
Putting the above together leads me to this working example. Note that I also demonstrate how to sort points by distance, answering your second question.
#!/usr/bin/env python3
import numpy as np
import numpy.random
import geopandas as gpd
import shapely.geometry
import operator
oregon_xmin = -124.5664
oregon_xmax = -116.4633
oregon_ymin = 41.9920
oregon_ymax = 46.2938
def radius(gpd_df, cpt, radius):
"""
:param gpd_df: Geopandas dataframe in which to search for points
:param cpt: Point about which to search for neighbouring points
:param radius: Radius about which to search for neighbours
:return: List of point indices around the central point, sorted by
distance in ascending order
"""
#Spatial index
sindex = gpd_df.sindex
#Bounding box of rtree search (West, South, East, North)
bbox = (cpt.x-radius, cpt.y-radius, cpt.x+radius, cpt.y+radius)
#Potential neighbours
good = []
for n in sindex.intersection(bbox):
dist = cpt.distance(gpd_df['geometry'][n])
if dist<radius:
good.append((dist,n))
#Sort list in ascending order by `dist`, then `n`
good.sort()
#Return only the neighbour indices, sorted by distance in ascending order
return [x[1] for x in good]
#Generate random points throughout Oregon
x = np.random.uniform(low=oregon_xmin, high=oregon_xmax, size=10000)
y = np.random.uniform(low=oregon_ymin, high=oregon_ymax, size=10000)
#Turn the lat-long points into a geodataframe
gpd_df = gpd.GeoDataFrame(data={'x':x, 'y':y})
#Set up point geometries so that we can index the data frame
gpd_df['geometry'] = gpd_df.apply(lambda row: shapely.geometry.Point((row['x'], row['y'])), axis=1)
#The 'x' and 'y' columns are now stored as part of the geometry, so we remove
#their columns in order to save space
del gpd_df['x']
del gpd_df['y']
for i, row in gpd_df.iterrows():
neighbours = radius(gpd_df,row['geometry'],0.5)
print(neighbours)
#Use len(neighbours) here to construct a new row for the data frame
(What I had been requesting in the comments is code that looks like the foregoing, but which exemplifies your problem. Note the use of random
to succinctly generate a dataset for experimentation.)
来源:https://stackoverflow.com/questions/44622233/rtree-count-points-in-the-neighbourhoods-within-each-point-of-another-set-of-po