Comparing Results from StandardScaler vs Normalizer in Linear Regression

前端 未结 3 2231
无人及你
无人及你 2021-02-18 21:40

I\'m working through some examples of Linear Regression under different scenarios, comparing the results from using Normalizer and StandardScaler, and

3条回答
  •  旧时难觅i
    2021-02-18 22:13

    Answer to Q1

    I am assuming that what you mean with the first 2 models is reg1 and reg2. Let us know if that is not the case.

    A linear regression has the same predictive power if you normalize the data or not. Therefore, using normalize=True has no impact on the predictions. One way to understand this is to see that normalization (column-wise) is a linear operation on each of the columns ((x-a)/b) and linear transformations of the data on a Linear regression does not affect coefficient estimation, only change their values. Notice that this statement is not true for Lasso/Ridge/ElasticNet.

    So, why aren't the coefficients different? Well, normalize=True also takes into account that what the user normally wants is the coefficients on the original features, not the normalised features. As such, it adjusts the coefficients. One way to check that this makes sense is to use a simpler example:

    # two features, normal distributed with sigma=10
    x1 = np.random.normal(0, 10, size=100)
    x2 = np.random.normal(0, 10, size=100)
    
    # y is related to each of them plus some noise
    y = 3 + 2*x1 + 1*x2 + np.random.normal(0, 1, size=100)
    
    X = np.array([x1, x2]).T  # X has two columns
    
    reg1 = LinearRegression().fit(X, y)
    reg2 = LinearRegression(normalize=True).fit(X, y)
    
    # check that coefficients are the same and equal to [2,1]
    np.testing.assert_allclose(reg1.coef_, reg2.coef_) 
    np.testing.assert_allclose(reg1.coef_, np.array([2, 1]), rtol=0.01)
    

    Which confirms that both methods correctly capture the real signal between [x1,x2] and y, namely, the 2 and 1 respectively.

    Answer to Q2

    Normalizer is not what you would expect. It normalises each row row-wise. So, the results will change dramatically, and likely destroy relationship between features and the target that you want to avoid except for specific cases (e.g. TF-IDF).

    To see how, assume the example above, but consider a different feature, x3, that is not related with y. Using Normalizer causes x1 to be modifed by the value of x3, decreasing the strenght of its relationship with y.

    Discrepancy of coefficients between models (1,2) and (4,5)

    The discrepancy between the coefficients is that when you standardise before fitting, the coefficients will be with respect to the standardised features, the same coefficients I referred in the first part of the answer. They can be mapped to the original parameters using reg4.coef_ / scaler.scale_:

    x1 = np.random.normal(0, 10, size=100)
    x2 = np.random.normal(0, 10, size=100)
    y = 3 + 2*x1 + 1*x2 + np.random.normal(0, 1, size=100)
    X = np.array([x1, x2]).T
    
    reg1 = LinearRegression().fit(X, y)
    reg2 = LinearRegression(normalize=True).fit(X, y)
    scaler = StandardScaler()
    reg4 = LinearRegression().fit(scaler.fit_transform(X), y)
    
    np.testing.assert_allclose(reg1.coef_, reg2.coef_) 
    np.testing.assert_allclose(reg1.coef_, np.array([2, 1]), rtol=0.01)
    
    # here
    coefficients = reg4.coef_ / scaler.scale_
    np.testing.assert_allclose(coefficients, np.array([2, 1]), rtol=0.01)
    

    This is because, mathematically, setting z = (x - mu)/sigma, the model reg4 is solving y = a1*z1 + a2*z2 + a0. We can recover the relationship between y and x through simple algebra: y = a1*[(x1 - mu1)/sigma1] + a2*[(x2 - mu2)/sigma2] + a0, which can be simplified to y = (a1/sigma1)*x1 + (a2/sigma2)*x2 + (a0 - a1*mu1/sigma1 - a2*mu2/sigma2).

    reg4.coef_ / scaler.scale_ represents [a1/sigma1, a2/sigma2] in the above notation, which is exactly what normalize=True does to guarantee that the coefficients are the same.

    Descrepancy of score of model 5.

    Standardized features are zero mean, but the target variable is not necessarily. Therefore, not fiting the intercept causes the model to disregard the mean of the target. In the example that I have been using, the "3" in y = 3 + ... is not fitted, which naturally decreases the predictive power of the model. :)

提交回复
热议问题