问题
I am using the getData function from the raster package to retrieve the map of Argentina. I would like to plot the resulting map using ggplot2, so I am converting to a dataframe using the tidy function from the broom package. This works fine, but I can't figure out how to preserve the names of the federal districts so that I can use them on the map.
Here is my original code that does not preserve the district names:
# Original code: ##################################
# get the map data from GADM.org and then simplify it
arg_map_1 <- raster::getData(country = "ARG", level = 1, path = "./data/") %>%
# simplify
rmapshaper::ms_simplify(keep = 0.01) %>%
# tidy to a dataframe
broom::tidy()
# plot the map
library(ggplot2)
ggplot(data=arg_map_1) +
geom_map(map=arg_map_1, aes(x=long, y=lat, map_id=id, fill=id),
color="#000000", size=0.25)
And here is the code with a hack to pull the district names out of the SPDF and use them as the map IDs:
# Code with a hack to keep the district names: ################################
# get the map data from GADM.org and then simplify it
arg_map_1 <- raster::getData(country = "ARG", level = 1, path = "./data/") %>%
# simplify
rmapshaper::ms_simplify(keep = 0.01)
for(region_looper in seq_along(arg_map_1@data$NAME_1)){
arg_map_1@polygons[[region_looper]]@ID <-
as.character(arg_map_1@data$NAME_1[region_looper])
}
# tidy to a dataframe
arg_map_1 <- arg_map_1 %>%
broom::tidy()
library(ggplot2)
ggplot(data=arg_map_1) +
geom_map(map=arg_map_1, aes(x=long, y=lat, map_id=id, fill=id),
color="#000000", size=0.25)
I keep thinking that there must be some way to use the tidy function that preserves the names, but for the life of me, I can't figure it out.
回答1:
You can use the join
function from package plyr
. Here is a general solution (it looks long but it is actually very easy):
Load shapefile: Let us say you have a shapefile
my_shapefile.shp
in your working directory. Let's load it:shape <- readOGR(dsn = "/my_working_directory", layer = "my_shapefile")
Notice that inside this shapefile there is a dataframe, which can be accessed with
shape@data
. For example, this dataframe could look like this:> head(shape@data) code region label 0 E12000006 East of England E12000006 1 E12000007 London E12000007 2 E12000002 North West E12000002 3 E12000001 North East E12000001 4 E12000004 East Midlands E12000004 5 E12000003 Yorkshire and The Humber E12000003
Create new dataframe from shapefile: Use the
broom
package to tide the shapefile dataframe:new_df <- tidy(shape)
This results in something like this:
> head(new_df)
long lat order hole piece group id
1 547491.0 193549.0 1 FALSE 1 0.1 0
2 547472.1 193465.5 2 FALSE 1 0.1 0
3 547458.6 193458.2 3 FALSE 1 0.1 0
4 547455.6 193456.7 4 FALSE 1 0.1 0
5 547451.2 193454.3 5 FALSE 1 0.1 0
6 547447.5 193451.4 6 FALSE 1 0.1 0
Unfortunately, tidy()
lost the variable names ("region", in this example). Instead, we got a new variable "id", starting at 0. Fortunately, the ordering of "id" is the same as that stored in shape@data$region
. Let us use this to recover the names.
Create auxiliary dataframe with row names: Let us create a new dataframe with the row names. Additionally, we will add an "id" variable, identical to the one
tidy()
created:# Recover row name temp_df <- data.frame(shape@data$region) names(temp_df) <- c("region") # Create and append "id" temp_df$id <- seq(0,nrow(temp_df)-1)
Merge row names with new dataframe using "id": Finally, let us put the names back into the new dataframe:
new_df <- join(new_df, temp_df, by="id")
That's it! You can even add more variables to the new dataframe, by using the join
command and the "id" index. The final result would be something like:
> head(new_df)
long lat order hole piece group id name var1 var2
1 547491.0 193549.0 1 FALSE 1 0.1 0 East of England 0.525 0.333
2 547472.1 193465.5 2 FALSE 1 0.1 0 East of England 0.525 0.333
3 547458.6 193458.2 3 FALSE 1 0.1 0 East of England 0.525 0.333
4 547455.6 193456.7 4 FALSE 1 0.1 0 East of England 0.525 0.333
5 547451.2 193454.3 5 FALSE 1 0.1 0 East of England 0.525 0.333
6 547447.5 193451.4 6 FALSE 1 0.1 0 East of England 0.525 0.333
回答2:
alistaire's comment pushed me to keep pushing on the region=
parameter. I tried many iterations and I found some ideas in this thread https://github.com/tidyverse/ggplot2/issues/1447.
Here is the code that grabs the district names:
# load the magrittr library to get the pipe
library(magrittr)
# load the maptools library to get the rgeos object
library(maptools)
arg_map_1 <- raster::getData(country = "ARG", level = 1, path = "./data/") %>%
# simplify
rmapshaper::ms_simplify(keep = 0.01) %>%
# tidy to a dataframe
broom::tidy(region="NAME_1")
# plot the map
library(ggplot2)
ggplot(data=arg_map_1) +
geom_map(map=arg_map_1, aes(x=long, y=lat, map_id=id, fill=id),
color="#000000", size=0.25)
First of all, notice that the maptools library must be loaded in order for the tidy operation to work correctly. Also, I want to highlight that the variable to extract the region information from must be enclosed in quotes. I had been assuming incorrectly that broom would recognize the variable name in the same way that other tidyverse packages such as dplyr recognize column names unquoted or surrounded by backticks.
来源:https://stackoverflow.com/questions/40576457/keep-region-names-when-tidying-a-map-using-broom-package