问题
I'm trying to produce a choropleth map of county level data on COVID-19 infections using R. I'm a relative newbie to R so....
I've done some fairly basic stuff with ggmap to plot spatial data, but never anything quite like this. Typically I just have points of interest that I need to overlay on a map, so I can use geom_point and their lat/lon. In this case I need to construct the underlying map and then fill regions and I'm struggling with doing that in the ggplot world.
I've followed some online examples I've found to get as far as this:
library(ggplot2)
library(broom)
library(geojsonio)
#get a county level map geoJSON file
counties <- geojson_read("https://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_500k.json", what = "sp")
#filter our alaska and Hawaii
lower48 <- counties[(counties@data$STATE != "02" & counties@data$STATE != "15") ,]
#turn it into a dataframe for ggmap
new_counties <- tidy(lower48)
# Plot it
print(ggplot() +
geom_polygon(data = new_counties, aes( x = long, y = lat, group = group), fill="#69b3a2", color="white") +
theme_void() +
coord_map())
Which produces this plot:
So far so good. But my new_counties dataframe now looks like this:
head(new_counties)
# A tibble: 6 x 7
long lat order hole piece group id
<dbl> <dbl> <int> <lgl> <chr> <chr> <chr>
1 -85.4 33.9 1 FALSE 1 0.1 0
2 -85.4 33.9 2 FALSE 1 0.1 0
3 -85.4 33.9 3 FALSE 1 0.1 0
4 -85.4 33.9 4 FALSE 1 0.1 0
5 -85.4 33.9 5 FALSE 1 0.1 0
6 -85.4 33.8 6 FALSE 1 0.1 0
So I've lost anything that I might be able to tie back to my county level data on infections.
My data has a 5-digit FIPS code for each county. First two digits are the state and last three are the county. My geoJSON file has a much more detailed FIPS code. I tried grabbing just the first 5 and creating my own data element I could map back to
library(ggplot2)
library(broom)
library(geojsonio)
#get a county level map geoJSON file
counties <- geojson_read("https://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_500k.json", what = "sp")
#filter our alaska and Hawaii
lower48 <- counties[(counties@data$STATE != "02" & counties@data$STATE != "15") ,]
#add my own FIPS code
lower48@data$myFIPS <- substr(as.character(lower48@data$GEO_ID),1,5)
#turn it into a dataframe for ggmap
new_counties <- tidy(lower48, region = "myFIPS")
# Plot it
print(ggplot() +
geom_polygon(data = new_counties, aes( x = long, y = lat, group = group), fill="#69b3a2", color="white") +
theme_void() +
coord_map())
But that produces this plot
And I have to say I'm not quite familiar enough with broom::tidy to know exactly why. I also notice as I type this that I need to filter out Puerto Rico!
If anybody can point me back in a useful direction....I'm not wedded to the current approach, though I would like to stick to ggplot2 or ggmap. My boss eventually wants me to overlay certain features. Ultimately the goal is to follow the example here and produce an animated map showing data over time, but I'm obviously a long way from that.
回答1:
There's many ways to do this, but the concept is basically the same: Find a map containing country level FIPS codes and use them to link with a data source, also containing the same FIPS codes as well as the variable for plotting (here the number of covid-19 cases per day).
#devtools::install_github("UrbanInstitute/urbnmapr")
library(urbnmapr) # For map
library(ggplot2) # For map
library(dplyr) # For summarizing
library(tidyr) # For reshaping
library(stringr) # For padding leading zeros
# Get COVID cases, available from:
url <- "https://static.usafacts.org/public/data/covid-19/covid_confirmed_usafacts.csv
?_ga=2.162130428.136323622.1585096338-408005114.1585096338"
COV <- read.csv(url, stringsAsFactors = FALSE)
names(COV)[1] <- "countyFIPS" # Fix the name of first column. Why!?
The data are stored in wide format with daily cases per county spread across columns. This needs to be gathered before merging with the map. The dates need to be converted to proper dates. The FIPS codes are stored as integers, so these need to be converted to a character with leading 0s in order to merge with the map data. I use the urbnmap package for the map data.
Covid <- pivot_longer(COV, cols=starts_with("X"),
values_to="cases",
names_to=c("X","date_infected"),
names_sep="X") %>%
mutate(date_infected = as.Date(date_infected, format="%m.%d.%Y"),
countyFIPS = str_pad(as.character(countyFIPS), 5, pad="0"))
# Obtain map data for counties (to link with covid data) and states (for showing borders)
states_sf <- get_urbn_map(map = "states", sf = TRUE)
counties_sf <- get_urbn_map(map = "counties", sf = TRUE)
# Merge county map with total cases of cov
counties_cov <- inner_join(counties_sf, group_by(Covid, countyFIPS) %>%
summarise(cases=sum(cases)), by=c("county_fips"="countyFIPS"))
counties_cov %>%
ggplot() +
geom_sf(mapping = aes(fill = cases), color = NA) +
geom_sf(data = states_sf, fill = NA, color = "black", size = 0.25) +
coord_sf(datum = NA) +
scale_fill_gradient(name = "Cases", trans = "log", low='pink', high='navyblue',
na.value="white", breaks=c(1, max(counties_cov$cases))) +
theme_bw() + theme(legend.position="bottom", panel.border = element_blank())
For animation, you can use the gganimate package and transition through the days. The commands are similar to above except that the covid data should not be summarized.
library(gganimate)
counties_cov <- inner_join(counties_sf, Covid, by=c("county_fips"="countyFIPS"))
p <- ggplot(counties_cov) + ... # as above
p <- p + transition_time(date_infected) +
labs(title = 'Date: {frame_time}')
animate(p, end_pause=30)
来源:https://stackoverflow.com/questions/60840971/creating-a-choropleth-map-with-us-county-level-data