How do i select objects within a geographic regions in a pandas dataframe

感情迁移 提交于 2020-06-08 15:11:24

问题


I'm trying to selection objects within a region from a pandas dataframe which contains a list of item ids and lat lon pairs. Is there a selection method for this? I think this would be similar to this SO question but using PANDAS instead of SQL

Selecting geographical points within area

Here is my table saved in locations.csv

ID, LAT, LON
001,35.00,-75.00
002,35.01,-80.00 
...
999,25.76,-64.00

I can load the dataframe, and select a rectangular region:

import pandas as pd
df = pd.read_csv('locations.csv', delimiter=',')
lat_max = 32.323496
lat_min = 25.712767
lon_max = -72.863358
lon_min = -74.729456
small_df = df[df['LAT'] > lat_min][df['LAT'] < lat_max][df['LON'] < lon_max][df['LON'] > lon_min]

How would I select objects within an irregular region?

How would I structure the dataframe selection command?

I can build a lambda function that will produce a True value for LAT and LON within the region but I'm not sure how to use that with a pandas dataframe.


回答1:


A process to select points within a region as performed by the working code below starts with creating 2 geodataframes. The first one contains a polygon, and the second contains all the points to do spatial join with the first. The spatial join operator within is used to enable the points that fall inside the polygon to be selected. The result of the operation is also a geodataframe, it contains only the required points that fall within the area of the polygon.

The content of locations.csv; 6 lines with column headers. Note: no spaces in the first row.

ID,LAT,LON
1, 15.1, 10.0
2, 15.2, 15.1
3, 15.3, 20.2
4, 15.4, 25.3
5, 15.5, 30.4

The code:

import pandas as pd
import geopandas as gpd
from shapely import wkt
from shapely.geometry import Point, Polygon
from shapely.wkt import loads

# Create a geo-dataframe `polygon_df` having 1 row of polygon
# This polygon will be used to select points in a geodataframe
d = {'poly_id':[1], 'wkt':['POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))']}
df = pd.DataFrame( data=d )
geometry = [loads(pgon) for pgon in df.wkt]
polygon_df = gpd.GeoDataFrame(df, \
                   crs={'init': 'epsg:4326'}, \
                   geometry=geometry)

# One can plot this polygon with the command:
# polygon_df.plot()

# Read the file with `pandas`
locs = pd.read_csv('locations.csv', sep=',')

# Making it a geo-dataframe with new name: `geo_locs`
geo_locs = gpd.GeoDataFrame(locs, crs={'init': 'epsg:4326'})
locs_geom = [Point(xy) for xy in zip(geo_locs.LON, geo_locs.LAT)]
geo_locs['wkt'] = geo_locs.apply( lambda x: Point(x.LON, x.LAT), axis=1 )
geo_locs = gpd.GeoDataFrame(geo_locs, crs={'init': 'epsg:4326'}, \
    geometry=geo_locs['wkt'])

# Do a spatial join of `point` within `polygon`, get the result in `pts_in_poly` GeodataFrame.
pts_in_poly = gpd.sjoin(geo_locs, polygon_df, op='within', how='inner')

# Print the ID of the points that fall within the polygon.
print(pts_in_poly.ID)

# The output will be:
#2    3
#3    4
#4    5
#Name: ID, dtype: int64

# Plot the polygon and all the points.
ax1 = polygon_df.plot(color='lightgray', zorder=1)
geo_locs.plot(ax=ax1, zorder=5, color="red")

The output plot:

In the plot, the points with ID's 3, 4, and 5 fall within the polygon.



来源:https://stackoverflow.com/questions/55637143/how-do-i-select-objects-within-a-geographic-regions-in-a-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!