I was surprised to find out that clara
from library(cluster)
allows NAs. But function documentation says nothing about how it handles these values.
Not sure if kmeans
can handle missing data by ignoring the missing values in a row.
There are two steps in kmeans
;
When we have missing data in our observations:
Step 1 can be handled by adjusting the distance metric appropriately as in the clara/pam/daisy
package. But Step 2 can only be performed if we have some value for each column of an observation. Therefore imputing might be the next best option for kmeans
to deal missing data.