mahalanobis distance in R between 2 goups

前端 未结 1 360
走了就别回头了
走了就别回头了 2021-01-21 12:27

I have two groups, that each group has 3 variables such as following:

Group1:
     cost time quality
[1,]   90    4      70
[2,]    4   27      37
[3,]   82    4         


        
相关标签:
1条回答
  • 2021-01-21 13:18

    I felt like what you are trying to do must exist in some R package. After a pretty thorough search, I found function D.sq in package asbio which looks very close. This function requires 2 matrices as input, so it doesn't work for your example. I also include a modified version that accepts a vector for the 2nd matrix.

    # Original Function
    D.sq <- function (g1, g2) {
        dbar <- as.vector(colMeans(g1) - colMeans(g2))
        S1 <- cov(g1)
        S2 <- cov(g2)
        n1 <- nrow(g1)
        n2 <- nrow(g2)
        V <- as.matrix((1/(n1 + n2 - 2)) * (((n1 - 1) * S1) + ((n2 - 
            1) * S2)))
        D.sq <- t(dbar) %*% solve(V) %*% dbar
        res <- list()
        res$D.sq <- D.sq
        res$V <- V
        res
    }
    
    # Data
    g1 <- matrix(c(90, 4, 70, 4, 27, 37, 82, 4, 17, 18, 41, 4), ncol = 3, byrow = TRUE)
    g2 <- c(2, 27, 4)
    
    # Function modified to accept a vector for g2 rather than a matrix
    D.sq2 <- function (g1, g2) {
        dbar <- as.vector(colMeans(g1) - g2)
        S1 <- cov(g1)
        S2 <- var(g2)
        n1 <- nrow(g1)
        n2 <- length(g2)
        V <- as.matrix((1/(n1 + n2 - 2)) * (((n1 - 1) * S1) + ((n2 - 
            1) * S2)))
        D.sq <- t(dbar) %*% solve(V) %*% dbar
        res <- list()
        res$D.sq <- D.sq
        res$V <- V
        res
    }
    

    However, this doesn't quite give the answer you expect: D.sq2(g1,g2)$D.sq returns 2.2469.

    Perhaps you can compare your original matlab method with these details and figure out the source of the difference. A quick look suggests the difference is how the denominator in V is computed. It may well also be an error on my part, but hopefully this gets you going.

    0 讨论(0)
提交回复
热议问题