SpatialPoly Data: SpatialData
Yield Data: Yield Data
Code:
## Loading packages
library(rgdal)
library(plyr)
library(maps)
lib
First, let me apologize for taking so long to get back - I missed your comment among all the others. Is this what you had in mind?
This was produced with the following code. Before getting into an explanation, you should be aware that creating a legend is the least of your problems. Note how the colors are different in the two maps. Your code above does not assign CO2 changes to the correct regions. For example, according to MoroccoYields.csv
, the largest change (improvement?) was -0.205
in Region 4
, but on your map the largest (darkest red) is at the northeastern tip of Morocco, which is actually l'Oriental (Region 6)
. An explanation follows the code.
## Loading packages
library(rgdal)
library(plyr)
library(maps)
library(maptools)
library(mapdata)
library(ggplot2)
library(RColorBrewer)
library(foreign)
library(sp)
# get.centroids: function to extract polygon ID and centroid from shapefile
get.centroids = function(x){
poly = MoroccoReg@polygons[[x]]
ID = poly@ID
centroid = as.numeric(poly@labpt)
return(c(id=ID, long=centroid[1], lat=centroid[2]))
}
#setwd("Directory where shapefile and Yields are stored")
## Loading shapefiles and .csv files
MoroccoReg <- readOGR(dsn=".", layer="Morocco_adm1")
MoroccoYield <- read.csv(file = "Morocco_Yield.csv", header=TRUE, sep=",", na.string="NA", dec=".", strip.white=TRUE)
MoroccoYield$ID_1 <- substr(MoroccoYield$ID_1,3,10)
## Reorder the data in the shapefile based on the category variable "ID_1" and change to dataframe
MoroccoReg <- MoroccoReg[order(MoroccoReg$ID_1), ]
MoroccoYield <- cbind(id=rownames(MoroccoReg@data),MoroccoYield)
# build table of labels for annotation (legend).
labs <- do.call(rbind,lapply(1:14,get.centroids))
labs <- merge(labs,MoroccoYield[,c("id","ID_1","Label")],by="id")
labs[,2:3] <- sapply(labs[,2:3],function(x){as.numeric(as.character(x))})
labs$sort <- as.numeric(as.character(labs$ID_1))
labs <- labs[order(labs$sort),]
MoroccoReg.df <- fortify(MoroccoReg)
## This does NOT work...
## Add the yield impacts column to shapefile from the .csv file by "ID_1"
## Note that in the .csv file, I just added the column "ID_1" to match it with the shapefile
#MoroccoReg.df <- cbind(MoroccoReg.df,MoroccoYield,by = 'ID_1')
## Do it this way...
MoroccoReg.df <- merge(MoroccoReg.df,MoroccoYield, by="id")
## Check the structure and contents of shapefile
attributes(MoroccoReg.df)
## Plotting
MoroccoRegMap1 <- ggplot(data = MoroccoReg.df, aes(long, lat, group=id))
MoroccoRegMap1 <- MoroccoRegMap1 + geom_polygon(aes(fill = A2Med_noCO2))
MoroccoRegMap1 <- MoroccoRegMap1 + geom_path(colour = 'gray', linestyle = 2)
MoroccoRegMap1 <- MoroccoRegMap1 + scale_fill_gradient2(name = "%Change in yield",low = "#CC0000",mid = "#FFFFFF",high = "#006600")
MoroccoRegMap1 <- MoroccoRegMap1 + labs(title="SRES_A2, noCO2 Effect")
MoroccoRegMap1 <- MoroccoRegMap1 + coord_equal() #+ theme_map()
MoroccoRegMap1 <- MoroccoRegMap1 + geom_text(data=labs, aes(x=long, y=lat, label=ID_1), size=4)
MoroccoRegMap1 <- MoroccoRegMap1 + annotate("text", x=max(labs$long)-5, y=min(labs$lat)+3-0.5*(1:14),
label=paste(labs$ID_1,": ",labs$Label,sep=""),
size=3, hjust=0)
MoroccoRegMap1
Explanation:
First, on merging your yield data with the map regions: you use
MoroccoReg.df <- cbind(MoroccoReg.df,MoroccoYield,by = 'ID_1')
This is not how cbind(...)
works. cbind(...)
merely combines it's arguments column-wise. It is not a merge function. So you had a data frame, MoroccoReg.df
, with 107,800 rows (one row for every line endpoint on your map), and you are combining it with MoroccoYield
, which has 14 rows (1 for every Region). So cbind(...)
replicates those 14 rows 7700 times to fill out the 107,800 rows it needs. The expression by="ID_1"
merely adds another column named "by"
with "ID_1"
replicated 107,800 times. Run the statement above and type head(MoroccoReg.df)
and look for the last column.
So how to do the merge? There are a number of functions in R
that are supposed to make this easy, but I couldn't get any of them to work. This is what did work:
Every polygon in the shapefile has an ID. There is also an "ID_1" field in the shapefile data section, but these are different and unrelated. [BTW: The ID_1
field in the shapefile data section, and the ID_1
field in your csv
file were also different: the latter has "TR"
prepended to the region number; so that had to be dealt with as well].
Reordering the shapefile with:
MoroccoReg <- MoroccoReg[order(MoroccoReg$ID_1), ]
will change the order of the polygons, but will not change their ID's. It turns out that the polygon ID matches the row name in the data section of the shapefile, so I prepended that (using cbind(...)
!) to your MoroccoYeild
data frame.
MoroccoYield <- cbind(id=rownames(MoroccoReg@data),MoroccoYield)
So now MoroccoYield
has an id
field which maps to the polygon ID, and an ID_1
field, which identifies the Region. Now we can fortify(...)
and merge(...)
. merge(...)
does take a by=
argument.
MoroccoReg.df <- fortify(MoroccoReg)
MoroccoReg.df <- merge(MoroccoReg.df,MoroccoYield, by="id")
This appends all of your MoroccoYield
columns to the appropriate rows of MoroccoReg.df
.
Creating the legend:
The obvious question is how to position the labels? Ideally, we would place the Region number from MoroccoYield$ID_1
at the centroid of each region, and then create a legend that identifies the Regions, based on MoroccoYield$Label
.
So where to find the centroids? These are stored in an obscure location in the polygon section of the shapefile. To make a long story short, I created a utility function get.centroid(...)
which extracts the centroid from a polygon. Then I applied that function to all the polygons to produce a table of centroids with corresponding polygon ID. Then I merged that with the labels in MoroccoYield
. This created a data frame labs
which has the following columns:
id: polygon ID
long: centroid longitude
lat: centroid latitude
ID_1: region ID
label: region name
sort: a sortable (numeric) version of ID_1
Then, adding the following code to your ggplot...
...
MoroccoRegMap1 <- MoroccoRegMap1 + geom_text(data=labs, aes(x=long, y=lat, label=label.id), size=4)
MoroccoRegMap1 <- MoroccoRegMap1 + annotate("text", x=max(labs$long)-5, y=min(labs$lat)+3-0.5*(1:14),
label=paste(labs$label.id,": ",labs$Label,sep=""),
size=3, hjust=0)
...creates the map. There's no clean way, that I could find, to do this with a formal "ggplot legend", so I had to use annotate(...)
. Positioning the annotation is kind of a hack, but it seems to work.
Edit: In response to @smailov83's comment, if you change the code to create the ggplot to this...
MoroccoRegMap1 <- ggplot(data = MoroccoReg.df, aes(long, lat, group=group))
MoroccoRegMap1 <- MoroccoRegMap1 + geom_polygon(aes(fill = A2Med_noCO2))
MoroccoRegMap1 <- MoroccoRegMap1 + geom_path(colour = 'gray', linestyle = 2)
MoroccoRegMap1 <- MoroccoRegMap1 + scale_fill_gradient2(name = "%Change in yield",low = "#CC0000",mid = "#FFFFFF",high = "#006600")
MoroccoRegMap1 <- MoroccoRegMap1 + labs(title="SRES_A2, noCO2 Effect")
MoroccoRegMap1 <- MoroccoRegMap1 + coord_equal() #+ theme_map()
MoroccoRegMap1 <- MoroccoRegMap1 + geom_text(data=labs, aes(x=long, y=lat, label=ID_1, group=ID_1), size=4)
MoroccoRegMap1 <- MoroccoRegMap1 + annotate("text", x=max(labs$long)-5, y=min(labs$lat)+3-0.5*(1:14),
label=paste(labs$ID_1,": ",labs$Label,sep=""),
size=3, hjust=0)
...you get this:
Which I believe fixes the problem. The reason for the extra lines in your map was that the ggplot
must be grouped by the column MoroccoReg.df$group
(so, aes(..., group=group)
not aes(...,group=id)
). When you do this, however, ggplot
tries to group by "group"
in all layers. In geom_text(...)
, where we are using a new, local dataset - the labs
data frame - there is no group
column. To deal with this, we must explicitly set group
to something else in geom_text(...)
. Bottom line: this seem to work.