How to do a point in polygon query efficiently using geopandas?

后端 未结 1 1891
长情又很酷
长情又很酷 2021-01-20 07:43

I have a shapefile that has all the counties for the US, and I am doing a bunch of queries at a lat/lon point and then finding what county the point lies in. Right now I am

1条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-01-20 08:12

    Your situation looks like a typical case where spatial joins are useful. The idea of spatial joins is to merge data using geographic coordinates instead of using attributes.

    Three possibilities in geopandas:

    • intersects
    • within
    • contains

    It seems like you want within, which is possible using the following syntax:

    geopandas.sjoin(points, polygons, how="inner", op='within')
    

    Note: You need to have installed rtree to be able to perform such operations. If you need to install this dependency, use pip or conda to install it

    Example

    As an example, let's plot European cities. The two example datasets are

    import geopandas
    import matplotlib.pyplot as plt
    
    world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
    cities = geopandas.read_file(geopandas.datasets.get_path('naturalearth_cities'))
    countries = world[world['continent'] == "Europe"].rename(columns={'name':'country'})
    
    countries.head(2)
        pop_est     continent   country     iso_a3  gdp_md_est  geometry
    18  142257519   Europe  Russia  RUS     3745000.0   MULTIPOLYGON (((178.725 71.099, 180.000 71.516...
    21  5320045     Europe  Norway  -99     364700.0    MULTIPOLYGON (((15.143 79.674, 15.523 80.016, ...
    
    cities.head(2)
        name    geometry
    0   Vatican City    POINT (12.45339 41.90328)
    1   San Marino  POINT (12.44177 43.93610)
    

    cities is a worldwide dataset and countries is an European wide dataset.

    Both dataset need to be in the same projection system. If not, use .to_crs before merging.

    data_merged = geopandas.sjoin(cities, countries, how="inner", op='within')
    

    Finally, to see the result let's do a map

    f, ax = plt.subplots(1, figsize=(20,10))
    data_merged.plot(axes=ax)
    countries.plot(axes=ax, alpha=0.25, linewidth=0.1)
    plt.show()
    

    and the underlying dataset merges together the information we need

    data_merged.head(5)
    
        name    geometry    index_right     pop_est     continent   country     iso_a3  gdp_md_est
    0   Vatican City    POINT (12.45339 41.90328)   141     62137802    Europe  Italy   ITA     2221000.0
    1   San Marino  POINT (12.44177 43.93610)   141     62137802    Europe  Italy   ITA     2221000.0
    192     Rome    POINT (12.48131 41.89790)   141     62137802    Europe  Italy   ITA     2221000.0
    2   Vaduz   POINT (9.51667 47.13372)    114     8754413     Europe  Austria     AUT     416600.0
    184     Vienna  POINT (16.36469 48.20196)   114     8754413     Europe  Austria     AUT     416600.0
    

    Here, I used inner join method but that's a parameter you can change if, for instance, you want to keep all points, including those not within a polygon.

    0 讨论(0)
提交回复
热议问题