keep region names when tidying a map using broom package

隐身守侯 提交于 2020-01-02 08:44:30

问题


I am using the getData function from the raster package to retrieve the map of Argentina. I would like to plot the resulting map using ggplot2, so I am converting to a dataframe using the tidy function from the broom package. This works fine, but I can't figure out how to preserve the names of the federal districts so that I can use them on the map.

Here is my original code that does not preserve the district names:

# Original code: ##################################
# get the map data from GADM.org and then simplify it
arg_map_1 <- raster::getData(country = "ARG", level = 1, path = "./data/")     %>% 
  # simplify
  rmapshaper::ms_simplify(keep = 0.01) %>% 
  # tidy to a dataframe
  broom::tidy()

# plot the map
library(ggplot2)
ggplot(data=arg_map_1) +
  geom_map(map=arg_map_1, aes(x=long, y=lat, map_id=id, fill=id),
       color="#000000", size=0.25)

And here is the code with a hack to pull the district names out of the SPDF and use them as the map IDs:

# Code with a hack to keep the district names: ################################
# get the map data from GADM.org and then simplify it
arg_map_1 <- raster::getData(country = "ARG", level = 1, path = "./data/") %>% 
  # simplify
  rmapshaper::ms_simplify(keep = 0.01)  

for(region_looper in seq_along(arg_map_1@data$NAME_1)){
  arg_map_1@polygons[[region_looper]]@ID <- 
    as.character(arg_map_1@data$NAME_1[region_looper]) 
}

# tidy to a dataframe
arg_map_1 <- arg_map_1 %>% 
  broom::tidy()

library(ggplot2)
ggplot(data=arg_map_1) +
  geom_map(map=arg_map_1, aes(x=long, y=lat, map_id=id, fill=id),
           color="#000000", size=0.25)

I keep thinking that there must be some way to use the tidy function that preserves the names, but for the life of me, I can't figure it out.


回答1:


You can use the join function from package plyr. Here is a general solution (it looks long but it is actually very easy):

  1. Load shapefile: Let us say you have a shapefile my_shapefile.shp in your working directory. Let's load it:

    shape <- readOGR(dsn = "/my_working_directory", layer = "my_shapefile")
    

    Notice that inside this shapefile there is a dataframe, which can be accessed with shape@data. For example, this dataframe could look like this:

    > head(shape@data)
           code                   region     label
    0 E12000006          East of England E12000006
    1 E12000007                   London E12000007
    2 E12000002               North West E12000002
    3 E12000001               North East E12000001
    4 E12000004            East Midlands E12000004
    5 E12000003 Yorkshire and The Humber E12000003
    
  2. Create new dataframe from shapefile: Use the broom package to tide the shapefile dataframe:

    new_df <- tidy(shape)
    

This results in something like this:

> head(new_df)
      long      lat order  hole piece group id           
1 547491.0 193549.0     1 FALSE     1   0.1  0 
2 547472.1 193465.5     2 FALSE     1   0.1  0 
3 547458.6 193458.2     3 FALSE     1   0.1  0 
4 547455.6 193456.7     4 FALSE     1   0.1  0 
5 547451.2 193454.3     5 FALSE     1   0.1  0 
6 547447.5 193451.4     6 FALSE     1   0.1  0

Unfortunately, tidy() lost the variable names ("region", in this example). Instead, we got a new variable "id", starting at 0. Fortunately, the ordering of "id" is the same as that stored in shape@data$region. Let us use this to recover the names.

  1. Create auxiliary dataframe with row names: Let us create a new dataframe with the row names. Additionally, we will add an "id" variable, identical to the one tidy() created:

    # Recover row name 
    temp_df <- data.frame(shape@data$region)
    names(temp_df) <- c("region")
    # Create and append "id"
    temp_df$id <- seq(0,nrow(temp_df)-1)
    
  2. Merge row names with new dataframe using "id": Finally, let us put the names back into the new dataframe:

    new_df <- join(new_df, temp_df, by="id")
    

That's it! You can even add more variables to the new dataframe, by using the join command and the "id" index. The final result would be something like:

> head(new_df)
      long      lat order  hole piece group id            name    var1    var2 
1 547491.0 193549.0     1 FALSE     1   0.1  0 East of England   0.525   0.333   
2 547472.1 193465.5     2 FALSE     1   0.1  0 East of England   0.525   0.333   
3 547458.6 193458.2     3 FALSE     1   0.1  0 East of England   0.525   0.333   
4 547455.6 193456.7     4 FALSE     1   0.1  0 East of England   0.525   0.333   
5 547451.2 193454.3     5 FALSE     1   0.1  0 East of England   0.525   0.333   
6 547447.5 193451.4     6 FALSE     1   0.1  0 East of England   0.525   0.333   



回答2:


alistaire's comment pushed me to keep pushing on the region= parameter. I tried many iterations and I found some ideas in this thread https://github.com/tidyverse/ggplot2/issues/1447.

Here is the code that grabs the district names:

# load the magrittr library to get the pipe
library(magrittr)
# load the maptools library to get the rgeos object
library(maptools)

arg_map_1 <- raster::getData(country = "ARG", level = 1, path = "./data/") %>% 
  # simplify
  rmapshaper::ms_simplify(keep = 0.01) %>% 
  # tidy to a dataframe
  broom::tidy(region="NAME_1")

# plot the map
library(ggplot2)
ggplot(data=arg_map_1) +
  geom_map(map=arg_map_1, aes(x=long, y=lat, map_id=id, fill=id),
           color="#000000", size=0.25)

First of all, notice that the maptools library must be loaded in order for the tidy operation to work correctly. Also, I want to highlight that the variable to extract the region information from must be enclosed in quotes. I had been assuming incorrectly that broom would recognize the variable name in the same way that other tidyverse packages such as dplyr recognize column names unquoted or surrounded by backticks.



来源:https://stackoverflow.com/questions/40576457/keep-region-names-when-tidying-a-map-using-broom-package

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!