clustering with NA values in R

后端 未结 3 2071
一向
一向 2021-02-13 20:07

I was surprised to find out that clara from library(cluster) allows NAs. But function documentation says nothing about how it handles these values.

3条回答
  •  野趣味
    野趣味 (楼主)
    2021-02-13 20:36

    Not sure if kmeans can handle missing data by ignoring the missing values in a row.

    There are two steps in kmeans;

    1. calculating the distance between an observation and original cluster mean.
    2. updating the new cluster mean based on the newly calculated distances.

    When we have missing data in our observations: Step 1 can be handled by adjusting the distance metric appropriately as in the clara/pam/daisy package. But Step 2 can only be performed if we have some value for each column of an observation. Therefore imputing might be the next best option for kmeans to deal missing data.

提交回复
热议问题