问题
I have the following boundary dataset for the United Kingdom, which shows all the counties:
library(raster)
library(sp)
library(ggplot)
# Download the data
GB <- getData('GADM', country="gbr", level=2)
Using the subset
function it is really easy to filter the shapefile polygons by an attribute in the data. For example, if I want to exclude Northern Ireland:
GB_sub <- subset(UK, NAME_1 != "Northern Ireland")
However, there are lots of small islands which distort the scale data range, as shown in the maps below:
Any thoughts on how to elegantly subset the dataset on a minimum size? It would be ideal to have something in the format consistent with the subset argument. For example:
GB_sub <- subset(UK, Area > 20) # specify minimum area in km^2
回答1:
Here is another potential solution. Because your data is in lat-long projection, directly calculating the area based on latitude and longitude would cause bias, it is better to calculate the area based on functions from the geosphere
package.
install.packages("geosphere")
library(geosphere)
# Calculate the area
GB$poly_area <- areaPolygon(GB) / 10^6
# Filter GB based on area > 20 km2
GB_filter <- subset(GB, poly_area > 20)
poly_area
contains the area in km2 for all polygons. We can filter the polygon by a threshold, such as 20
in your example. GB_filter
is the final output.
回答2:
This is one potential solution:
GB_sub = GB[sapply(GB@polygons, function(x) x@area>0.04),] # select min size
map.df <- fortify(GB_sub)
ggplot(map.df, aes(x=long, y=lat, group=group)) + geom_polygon()
Check this link for specifics on the actual interpretation of km2 size: Getting a slot's value of S4 objects?
I compared both as well but they don't seem to differ:
out1 = sapply(GB@polygons, function(x) x@area)
out2 = rgeos::gArea(GB, byid=TRUE)
来源:https://stackoverflow.com/questions/47078123/filter-shapefile-polygons-by-area