Distance between vectors with missing values

拜拜、爱过 提交于 2021-02-07 02:58:57

问题


For vectors A and B, euclidean distance is:sqrt((A1-B1)^2+(A2-B2)^2+...+(An-Bn)^2)

A <- c(5, 4, 3, 2, 1, 1, 2, 3, 5)
B <- c(1, 0, 6, 4, 3, 2, 3, 1, 3)
dist(rbind(A,B), method= "euclidean") 
7.681146

How is distance calculated when vectors A and B contain missing values? Here is an example: R output for distance is 8.485281 but how is it calculated?

A <- c(5, NA, NA, NA, 1, 1, 2, 3, 5)
B <- c(1, 0, 6, NA, NA, NA, NA, 1, 3)
dist(rbind(A,B), method= "euclidean")
8.485281

回答1:


Entries with NA are first removed, then the distance is scaled up to account for the larger dimension of the full sample:

i <- is.na(A) | is.na(B)
dist(rbind(A[!i], B[!i])) * sqrt(length(A) / length(A[!i]))
#          A2
# B2 8.485281


来源:https://stackoverflow.com/questions/23176821/distance-between-vectors-with-missing-values

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!