R - Cloropleth: Of the data points that fall within a polygon, what percentage have a specific column value?

问题

An extension of this thread: Create choropleth map from coordinate points. (I didn't want to combine the two threads for the sake of being relevant to as many people as possible.)

I have a data frame consisting of many observations, each with geocoordinates (latitude-longitude) and a boolean (yes-no) value. I would like to generate a choropleth map of the world where each region/polygon is shaded by the percentage of points within it where the associated boolean value is equal to true.

Here is a minimally reproducible example, which right now only shades according to the number of points in a polygon. The "like" column of data is my boolean.

# Load package
library(tidyverse)
library(ggmap)
library(maps)
library(maptools)
library(sf)

data <- data.frame(class = c("Private", "Private", "Private", "Private", "Private", "Private", "Not Private", "Not Private", "Private", "Private", "Not Private", "Private", "Not Private", "Private", "Private", "Not Private", "Not Private", "Private", "Private", "Not Private"),
                   lat = c(33.663944, 41.117936, 28.049601, 39.994684, 36.786042, 12.797659, 52.923318, 33.385555, 9.295242, 32.678207, 41.833585, -28.762956, 39.284713, 36.060964, 36.052239, 36.841764, 33.714237, 33.552863, 32.678207, -38.132401),
                   lon = c(-83.98686, -77.60468, -81.97271, -82.98577, -119.78246, 121.82814, -1.16057, -86.76009, 123.27758,   -83.17387, -87.73201, 32.05737, -76.62048, -115.13517, -79.39961, -76.35592, -85.85172, -112.12468, -83.17387, 144.36946))

# Convert to simple feature object
point_sf <- st_as_sf(data, coords = c("lon", "lat"), crs = 4326)

# Get world map data
worldmap <- maps::map("world", fill = TRUE, plot = FALSE)

# Convert world to sp class
IDs <- sapply(strsplit(worldmap$names, ":"), "[", 1L)
world_sp <- map2SpatialPolygons(worldmap, IDs = IDs, 
                                proj4string = CRS("+proj=longlat +datum=WGS84"))

# Convert world_sp to simple feature object
world_sf <- st_as_sf(world_sp)

# Add country ID
world_sf <- world_sf %>%
  mutate(region = map_chr(1:length(world_sp@polygons), function(i){
    world_sp@polygons[[i]]@ID
  }))

# Use st_within
result <- st_within(point_sf, world_sf, sparse = FALSE)

# Calculate the total count of each polygon
# Store the result as a new column "Count" in world_sf
world_sf <- world_sf %>%
  mutate(Count = apply(result, 2, sum))

# Convert world_sf to a data frame world_df 
world_df <- world_sf
st_geometry(world_df) <- NULL

# Get world data frame
world_data <- map_data("world")

# Merge world_data and world_df
world_data2 <- world_data %>%
  left_join(world_df, by = c("region"))

ggplot() + 
  geom_polygon(data = world_data2, aes(x = long, y = lat, group = group, fill = Count)) +
  coord_fixed(1.3)

Special thanks to https://stackoverflow.com/users/7669809/ycw for the help thus far.

回答1:

We can first count how many points in a polygon, filter the datasets for records labeled as Private in the class column, and then count how many points in the polygon again. We can later calculate the percentage by using Private count number dividing by all count number and multipling by 100%.

One nice feature about the sf object is it is also a data frame. So operations to manage a data frame, such as filter from the dplyr package, also work for the sf object. So we can use command like point_private_sf <- point_sf %>% filter(class %in% "Private") to filter the points easily.

# Load package
library(tidyverse)
library(maps)
library(maptools)
library(sf)

### Data Preparation

data <- data.frame(class = c("Private", "Private", "Private", "Private", "Private", "Private", "Not Private", "Not Private", "Private", "Private", "Not Private", "Private", "Not Private", "Private", "Private", "Not Private", "Not Private", "Private", "Private", "Not Private"),
                   lat = c(33.663944, 41.117936, 28.049601, 39.994684, 36.786042, 12.797659, 52.923318, 33.385555, 9.295242, 32.678207, 41.833585, -28.762956, 39.284713, 36.060964, 36.052239, 36.841764, 33.714237, 33.552863, 32.678207, -38.132401),
                   lon = c(-83.98686, -77.60468, -81.97271, -82.98577, -119.78246, 121.82814, -1.16057, -86.76009, 123.27758,   -83.17387, -87.73201, 32.05737, -76.62048, -115.13517, -79.39961, -76.35592, -85.85172, -112.12468, -83.17387, 144.36946))

# Convert to simple feature object
point_sf <- st_as_sf(data, coords = c("lon", "lat"), crs = 4326)

# Get world map data
worldmap <- maps::map("world", fill = TRUE, plot = FALSE)

# Convert world to sp class
IDs <- sapply(strsplit(worldmap$names, ":"), "[", 1L)
world_sp <- map2SpatialPolygons(worldmap, IDs = IDs, 
                                proj4string = CRS("+proj=longlat +datum=WGS84"))

# Convert world_sp to simple feature object
world_sf <- st_as_sf(world_sp)

# Add country ID
world_sf <- world_sf %>%
  mutate(region = map_chr(1:length(world_sp@polygons), function(i){
    world_sp@polygons[[i]]@ID
  }))

### Use st_within for the analysis

# Use st_within for all points
result_all <- st_within(point_sf, world_sf, sparse = FALSE)

# Filter the points by "Private" in the class column
point_private_sf <- point_sf %>% filter(class %in% "Private")

# Use st_within for private points
result_private <- st_within(point_private_sf, world_sf, sparse = FALSE)

### Calculate the total count of each polygon
# Store the result as ew columns "Count_all" and "Count_private" in world_sf
world_sf <- world_sf %>%
  mutate(Count_all = apply(result_all, 2, sum),
         Count_private = apply(result_private, 2, sum)) %>%
  # Calculate the percentage
  mutate(Percent = ifelse(Count_all == 0, Count_all, Count_private/Count_all * 100))

### Plot the data

# Convert world_sf to a data frame world_df 
world_df <- world_sf
st_geometry(world_df) <- NULL

# Get world data frame
world_data <- map_data("world")

# Merge world_data and world_df
world_data2 <- world_data %>%
  left_join(world_df, by = c("region"))

ggplot() + 
  geom_polygon(data = world_data2, aes(x = long, y = lat, group = group, fill = Percent)) +
  coord_fixed(1.3)

来源：https://stackoverflow.com/questions/45899768/r-cloropleth-of-the-data-points-that-fall-within-a-polygon-what-percentage-h

标签

ggplot2

gis

choropleth