问题
I was trying the following code and found that StandardScaler(or MinMaxScaler)
and Normalizer
from sklearn
handle data very differently. This issue makes the pipeline construction more difficult. I was wondering if this design discrepancy is intentional or not.
from sklearn.preprocessing import StandardScaler, Normalizer, MinMaxScaler
For Normalizer
, the data is read "horizontally".
Normalizer(norm = 'max').fit_transform([[ 1., 1., 2., 10],
[ 2., 0., 0., 100],
[ 0., -1., -1., 1000]])
#array([[ 0.1 , 0.1 , 0.2 , 1. ],
# [ 0.02 , 0. , 0. , 1. ],
# [ 0. , -0.001, -0.001, 1. ]])
For StandardScaler
and MinMaxScaler
, the data is read "vertically".
StandardScaler().fit_transform([[ 1., 1., 2., 10],
[ 2., 0., 0., 100],
[ 0., -1., -1., 1000]])
#array([[ 0. , 1.22474487, 1.33630621, -0.80538727],
# [ 1.22474487, 0. , -0.26726124, -0.60404045],
# [-1.22474487, -1.22474487, -1.06904497, 1.40942772]])
MinMaxScaler().fit_transform([[ 1., 1., 2., 10],
[ 2., 0., 0., 100],
[ 0., -1., -1., 1000]])
#array([[0.5 , 1. , 1. , 0. ],
# [1. , 0.5 , 0.33333333, 0.09090909],
# [0. , 0. , 0. , 1. ]])
回答1:
This is expected behavior, because StandardScaler
and Normalizer
serve different purposes. The StandardScaler works 'vertically', because it...
Standardize[s] features by removing the mean and scaling to unit variance
[...] Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using the transform method.
while the Normalizer works 'horizontally', because it...
Normalize[s] samples individually to unit norm.
Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one.
Please have a look at the scikit-learn docs (links above), to get more insight, which serves your purpose better.
来源:https://stackoverflow.com/questions/54115571/why-do-standardscaler-and-normalizer-need-different-data-input