Why do standardscaler and normalizer need different data input?

女生的网名这么多〃 提交于 2019-12-22 12:19:33

问题


I was trying the following code and found that StandardScaler(or MinMaxScaler) and Normalizer from sklearn handle data very differently. This issue makes the pipeline construction more difficult. I was wondering if this design discrepancy is intentional or not.

from sklearn.preprocessing import StandardScaler, Normalizer, MinMaxScaler

For Normalizer, the data is read "horizontally".

Normalizer(norm = 'max').fit_transform([[ 1., 1.,  2., 10],
                                        [ 2.,  0.,  0., 100],
                                        [ 0.,  -1., -1., 1000]])
#array([[ 0.1  ,  0.1  ,  0.2  ,  1.   ],
#       [ 0.02 ,  0.   ,  0.   ,  1.   ],
#       [ 0.   , -0.001, -0.001,  1.   ]])

For StandardScaler and MinMaxScaler, the data is read "vertically".

StandardScaler().fit_transform([[ 1., 1.,  2., 10],
                                [ 2.,  0.,  0., 100],
                                [ 0.,  -1., -1., 1000]])
#array([[ 0.        ,  1.22474487,  1.33630621, -0.80538727],
#       [ 1.22474487,  0.        , -0.26726124, -0.60404045],
#       [-1.22474487, -1.22474487, -1.06904497,  1.40942772]])

MinMaxScaler().fit_transform([[ 1., 1.,  2., 10],
                              [ 2.,  0.,  0., 100],
                              [ 0.,  -1., -1., 1000]])
#array([[0.5       , 1.        , 1.        , 0.        ],
#       [1.        , 0.5       , 0.33333333, 0.09090909],
#       [0.        , 0.        , 0.        , 1.        ]])

回答1:


This is expected behavior, because StandardScaler and Normalizer serve different purposes. The StandardScaler works 'vertically', because it...

Standardize[s] features by removing the mean and scaling to unit variance

[...] Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using the transform method.

while the Normalizer works 'horizontally', because it...

Normalize[s] samples individually to unit norm.

Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one.

Please have a look at the scikit-learn docs (links above), to get more insight, which serves your purpose better.



来源:https://stackoverflow.com/questions/54115571/why-do-standardscaler-and-normalizer-need-different-data-input

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!