Does KMeans normalize features automatically in sklearn

后端 未结 1 712
臣服心动
臣服心动 2021-02-13 19:41

I was wondering if KMeans automatically normalizes the features before doing clustering. There seems to be no option to provide an input to ask for normalization.

1条回答
  •  -上瘾入骨i
    2021-02-13 20:27

    One differentiates data preprocessing (normalization, binning, weighting etc) and machine learning algorithms application. Use sklearn.preprocessing for data preprocessing. Moreover, data can be preprocessed in chain by different preprocessors.

    As for K-means, often it is not sufficient to normalize only mean. One normalizes data equalizing variance along different features as K-means is sensitive to variance in data, and features with larger variance have more emphasis on result. So for K-means, I would recommend using StandardScaler for data preprocessing.

    Don't forget also that k-means results are sensitive to the order of observations, and it is worth to run algorithm several times, shuffling data in between, averaging resulting clusters and running final evaluations with those averaged clusters centers as starting points.

    0 讨论(0)
提交回复
热议问题