clustering with NA values in R

后端未结

关注

 3  2072

一向 2021-02-13 20:07

I was surprised to find out that clara from library(cluster) allows NAs. But function documentation says nothing about how it handles these values.

3条回答

执念已碎 (楼主)

2021-02-13 20:26

By looking at the Clara c code, I noticed that in clara algorithm, when there are missing values in the observations, the sum of squares is "reduced" proportional to the number of missing values, which I think is wrong! line 646 of clara.c is like " dsum *= (nobs / pp) " which shows it counts the number of non-missing values in each pair of observations (nobs), divides it by the number of variables (pp) and multiplies this by the sum of squares. I think it must be done in other way, i.e. " dsum *= (pp / nobs) ".

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...