问题
In ggplot2, after I drawing the ellipse plot using stat_ellipse, is there any way to calculate the area of this ellipse? Here is the code and the plot:
library(ggplot2)
set.seed(1234)
x <- rnorm (1:1000)
y <- rnorm (1:1000)
data <- cbind(x, y)
data <- as.data.frame(data)
ggplot (data, aes (x = x, y = y))+
geom_point()+
stat_ellipse()
回答1:
You can calculate the area of the ellipse by finding its semi-major and semi-minor axes (as shown in this SO answer):
# Plot object
p = ggplot (data, aes (x = x, y = y))+
geom_point()+
stat_ellipse(segments=201) # Default is 51. We use a finer grid for more accurate area.
# Get ellipse coordinates from plot
pb = ggplot_build(p)
el = pb$data[[2]][c("x","y")]
# Center of ellipse
ctr = MASS::cov.trob(el)$center # Per @Roland's comment
# Calculate distance to center from each point on the ellipse
dist2center <- sqrt(rowSums((t(t(el)-ctr))^2))
# Calculate area of ellipse from semi-major and semi-minor axes.
# These are, respectively, the largest and smallest values of dist2center.
pi*min(dist2center)*max(dist2center)
[1] 13.82067
回答2:
The area can be directly calculated from the covariance matrix by calculating the eigenvalues first.
You need to scale the variances / eigenvalues by the factor of confidence that you want to get.
This thread is very helpful
set.seed(1234)
dat <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))
cov_dat <- cov(dat) # covariance matrix
eig_dat <- eigen(cov(dat))$values #eigenvalues of covariance matrix
vec <- sqrt(5.991* eig_dat) # half the length of major and minor axis for the 95% confidence ellipse
pi * vec[1] * vec[2]
#> [1] 18.38858
Created on 2020-02-27 by the reprex package (v0.3.0)
In this particular case, the covariances are zero, and the eigenvalues will be more or less the variance of the variables. So you can use just the variance for your calculation. - given that both are normally distributed.
set.seed(1234)
data <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))
pi * 5.991 * sd(data$x) * sd(data$y) # factor for 95% confidence = 5.991
#> [1] 18.41814
Created on 2020-02-27 by the reprex package (v0.3.0)
The calculated value is different from user eipi10's answer. This is likely due to the different calculation under the hood, with different assumptions on the underlying distribution. see this thread.
来源:https://stackoverflow.com/questions/38782051/how-to-calculate-the-area-of-ellipse-drawn-by-ggplot2