clustering with NA values in R

后端未结

关注

 3  2081

一向 2021-02-13 20:07

I was surprised to find out that clara from library(cluster) allows NAs. But function documentation says nothing about how it handles these values.

3条回答

野趣味 (楼主)

2021-02-13 20:36
Not sure if kmeans can handle missing data by ignoring the missing values in a row.

There are two steps in kmeans;
1. calculating the distance between an observation and original cluster mean.
2. updating the new cluster mean based on the newly calculated distances.
When we have missing data in our observations: Step 1 can be handled by adjusting the distance metric appropriately as in the clara/pam/daisy package. But Step 2 can only be performed if we have some value for each column of an observation. Therefore imputing might be the next best option for kmeans to deal missing data.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...