Choropleth Maps in R - TIGER Shapefile issue

前端 未结 2 1448
一个人的身影
一个人的身影 2021-01-14 02:13

Have a Question on Mapping with R, specifically around the choropleth maps in R.

I have a dataset of ZIP codes assigned to an are and some associated data (dataset i

相关标签:
2条回答
  • 2021-01-14 02:53

    I would advise the following.

    • Use readOGR from the rgdal package rather than readShapeSpatial.
    • Consider using ggplot2 for good-looking maps - many of the examples use this.
    • Refer to one of the existing examples of creating a choropleth such as this one to get an overview.
    • Start with a simple choropleth and gradually add your own data; don't try and get it all right at once.
    • If you need more help, create a reproducible example with a SMALL fake dataset and with links to the shapefiles in question. The idea is that you make it easy to help us help you rather than discourage us by not supplying code and data in your question.
    0 讨论(0)
  • 2021-01-14 02:59

    There are several examples and tutorials on making maps using R, but most are very general and, unfortunately, most map projects have nuances that create inscrutable problems. Yours is a case in point.

    The biggest issue I came across was that the US Census Bureau zip code tabulation area shapefile for the whole US is huge: ~800MB. When loaded using readOGR(...) the R SpatialPolygonDataFrame object is about 913MB. Trying to process a file this size, (e.g., converting to a data frame using fortify(...)), at least on my system, resulted in errors like the one you identified above. So the solution is to subset the file based in the zip codes that are actually in your data.

    This map:

    was made from your data using the following code.

    library(rgdal)
    library(ggplot2)
    library(stringr)
    library(RColorBrewer)
    
    setwd("<directory containing shapfiles and sample data>")
    
    data     <- read.csv("Sample.csv",header=T) # your sample data, downloaded as csv
    data$ZIP <- str_pad(data$ZIP,5,"left","0") # convert ZIP to char(5) w/leading zeros
    
    zips     <- readOGR(dsn=".","tl_2013_us_zcta510") # import zip code polygon shapefile
    map      <- zips[zips$ZCTA5CE10 %in% data$ZIP,]   # extract only zips in your Sample.csv
    map.df   <- fortify(map)        # convert to data frame suitable for plotting
    # merge data from Samples.csv into map data frame
    map.data <- data.frame(id=rownames(map@data),ZIP=map@data$ZCTA5CE10)
    map.data <- merge(map.data,data,by="ZIP")
    map.df   <- merge(map.df,map.data,by="id")
    # load state boundaries
    states <- readOGR(dsn=".","gz_2010_us_040_00_5m")
    states <- states[states$NAME %in% c("New York","New Jersey"),] # extract NY and NJ
    states.df <- fortify(states)    # convert to data frame suitable for plotting
    
    ggMap <- ggplot(data = map.df, aes(long, lat, group = group)) 
    ggMap <- ggMap + geom_polygon(aes(fill = Probability_1))
    ggMap <- ggMap + geom_path(data=states.df, aes(x=long,y=lat,group=group))
    ggMap <- ggMap + scale_fill_gradientn(name="Probability",colours=brewer.pal(9,"Reds"))
    ggMap <- ggMap + coord_equal()
    ggMap
    

    Explanation:

    The rgdal package facilitates the creation of R Spatial objects from ESRI shapefiles. In your case we are importing a polygon shapefile into a SpatialPolygonDataFrame object in R. The latter has two main parts: a polygon section, which contains the latitude and longitude points that will be joined to create the polygons on the map, and a data section which contains information about the polygons (so, one row for each polygon). If, e.g., we call the Spatial object map, then the two sections can be referenced as map@polygons and map@data. The basic challenge in making choropleth maps is to associate data from your Sample.csv file, with the relevant polygons (zip codes).

    So the basic workflow is as follows:

    1. Load polygon shapefiles into Spatial object ( => zips)
    2. Subset if appropriate ( => map).
    3. Convert to data frame suitable for plotting ( => map.df).
    4. Merge data from Sample.csv into map.df.
    5. Draw the map.
    

    Step 4 is the one that causes all the problems. First we have to associate zip codes with each polygon. Then we have to associate Probability_1 with each zip code. This is a three step process.

    Each polygon in the Spatial data file has a unique ID, but these ID's are not the zip codes. The polygon ID's are stored as row names in map@data. The zip codes are stored in map@data, in column ZCTA5CE10. So first we must create a data frame that associates the map@data row names (id) with map@data$ZCTA5CE10 (ZIP). Then we merge your Sample.csv with the result using the ZIP field in both data frames. Then we merge the result of that into map.df. This can be done in 3 lines of code.

    Drawing the map involves telling ggplot what dataset to use (map.df), which columns to use for x and y (long and lat) and how to group the data by polygon (group=group). The columns long, lat, and group in map.df are all created by the call to fortify(...). The call to geom_polygon(...) tells ggplot to draw polygons and fill using the information in map.df$Probability_1. The call to geom_path(...) tells ggplot to create a layer with state boundaries. The call to scale_fill_gradientn(...) tells ggplot to use a color scheme based on the color brewer "Reds" palette. Finally, the call to coord_equal(...) tells ggplot to use the same scale for x and y so the map is not distorted.

    NB: The state boundary layer, uses the US States TIGER file.

    0 讨论(0)
提交回复
热议问题