问题
I'm still new to R, trying to learn how to use the library vegan, which I can easily plot in R with the normal plot function. The problem arises when I want to plot the data in ggplot. I know I have to extract the right data from the list I've created, but which and how? The dataset I've been practicing on can be downloaded here https://drive.google.com/file/d/0B1PQGov60aoudVR3dVZBX1VKaHc/view?usp=sharing The code I've been using to get the data transformed is this:
library(vegan)
library(dplyr)
library(ggplot2)
library(grid)
data <- read.csv(file = "People.csv", header = T, sep = ",", dec = ".", check.names = F, na.strings=c("NA", "-", "?"))
data2 <- data[,-1]
rownames(data2) <- data[,1]
data2 <- scale(data2, center = T, scale = apply(data2, 2, sd))
data2.pca <- rda(data2)
Which gives me a list I can plot using the basic "plot" and "biplot" function, but I am at a loss as to how to plot both PCA and biplot in ggplot. I would also like to color the data points by group, e.g. sex. Any help would be great.
回答1:
There is a ggbiplot(...)
function in package ggbiplot
, but it only works with objects of class prcomp, princomp, PCA, or lda.
plot.rda(...)
just locates each case (person) in PC1 - PC2 space. biplot.rda(...)
adds vectors to the PC1 and PC2 loadings for each variable in the original dataset. It turns out that plot.rda(...)
and biplot.rda(...)
use the data produced by summarizing the rda object, not the rda object itself.
smry <- summary(data2.pca)
df1 <- data.frame(smry$sites[,1:2]) # PC1 and PC2
df2 <- data.frame(smry$species[,1:2]) # loadings for PC1 and PC2
rda.plot <- ggplot(df1, aes(x=PC1, y=PC2)) +
geom_text(aes(label=rownames(df1)),size=4) +
geom_hline(yintercept=0, linetype="dotted") +
geom_vline(xintercept=0, linetype="dotted") +
coord_fixed()
rda.plot
rda.biplot <- rda.plot +
geom_segment(data=df2, aes(x=0, xend=PC1, y=0, yend=PC2),
color="red", arrow=arrow(length=unit(0.01,"npc"))) +
geom_text(data=df2,
aes(x=PC1,y=PC2,label=rownames(df2),
hjust=0.5*(1-sign(PC1)),vjust=0.5*(1-sign(PC2))),
color="red", size=4)
rda.biplot
If you compare these results to plot(data2.pca)
and biplot(data2.pca)
I think you'll see they are the same. Believe it or not the hardest part, by far, is getting the text to align properly wrt the arrows.
回答2:
You can use my ggvegan package for this. It is still in-development though usable for some classes of objects including rda
and cca
ones.
Assuming the example data and analysis you can simply do:
autoplot(data2.pca, arrows = TRUE)
to get the sort of biplot you want. This produces
You can get site labels via
autoplot(data2.pca, arrows = TRUE, geom = "text", legend = "none")
which also shows how to suppress the legend if required (legend.position
takes values suitable for the same theme element in ggplot2).
You don't have a huge amount of control other the look of things with autoplot()
methods (yet!), but you can use fortify()
to get the data the way ggplot2 requires it and then use ideas from the other answers or study the code for ggvegan:::autoplot.rda
for the specifics.
You need to install ggvegan from github as the package is not yet on CRAN:
install.packages("devtools")
devtools::install_github("gavinsimpson/ggvegan")
which will get you version 0.0-6 (or later) which includes some minor tweaks to produce neater plots than previous versions.
回答3:
According to @jlhoward you can use ggbiplot
from the package with the same name. Then the only thing you need to do is to cast your rda
result to prcomp
result that is known by ggbiplot
. Here is a function to do that:
#' Cast vegan::rda Result to base::prcomp
#'
#' Function casts a result object of unconstrained
#' \code{\link[vegan]{rda}} to a \code{\link{prcomp}} result object.
#'
#' @param x An unconstrained \code{\link[vegan]{rda}} result object.
#'
#' @importFrom vegan scores
#' @export
`as.prcomp.rda` <-
function(x)
{
if (!is.null(x$CCA) || !is.null(x$pCCA))
stop("works only with unconstrained rda")
structure(
list(sdev = sqrt(x$CA$eig),
rotation = x$CA$v,
center = attr(x$CA$Xbar, "scaled:center"),
scale = if(!is.null(scl <- attr(x$CA$Xbar, "scaled:scale")))
scl
else
FALSE,
x = scores(x, display = "sites", scaling = 1,
choices = seq_len(x$CA$rank),
const = sqrt(x$tot.chi * (nrow(x$CA$u)-1)))),
class = "prcomp")
}
来源:https://stackoverflow.com/questions/32194193/plotting-rda-vegan-in-ggplot