Comparing Results from StandardScaler vs Normalizer in Linear Regression

前端 未结 3 2208
无人及你
无人及你 2021-02-18 21:40

I\'m working through some examples of Linear Regression under different scenarios, comparing the results from using Normalizer and StandardScaler, and

3条回答
  •  北恋
    北恋 (楼主)
    2021-02-18 22:07

    The last question (3) about the incoherent results with fit_intercept=0 and standardized data has not been answered fully.

    The OP is likely expecting StandardScaler to standardize X and y, which would make the intercept necessarily 0 (proof 1/3 of the way down).

    However StandardScaler ignores y. see the api.

    TransformedTargetRegressor offers a solution. This approach is also useful for non-linear transformations of the dependent variable such as the log transformation of y (but consider this).

    Here's an example that resolves OP's issue #3:

    from sklearn.base import BaseEstimator, TransformerMixin
    from sklearn.pipeline import make_pipeline
    from sklearn.datasets import make_regression
    from sklearn.compose import TransformedTargetRegressor
    from sklearn.linear_model import LinearRegression
    from sklearn.preprocessing import StandardScaler
    import numpy as np
    
    # define a custom transformer
    class stdY(BaseEstimator,TransformerMixin):
        def __init__(self):
            pass
        def fit(self,Y):
            self.std_err_=np.std(Y)
            self.mean_=np.mean(Y)
            return self
        def transform(self,Y):
            return (Y-self.mean_)/self.std_err_
        def inverse_transform(self,Y):
            return Y*self.std_err_+self.mean_
    
    # standardize X and no intercept pipeline
    no_int_pipe=make_pipeline(StandardScaler(),LinearRegression(fit_intercept=0)) # only standardizing X, so not expecting a great fit by itself.
    
    # standardize y pipeline
    std_lin_reg=TransformedTargetRegressor(regressor=no_int_pipe, transformer=stdY()) # transforms y, estimates the model, then reverses the transformation for evaluating loss.
    
    #after returning to re-read my answer, there's an even easier solution, use StandardScaler as the transfromer:
    std_lin_reg_easy=TransformedTargetRegressor(regressor=no_int_pipe, transformer=StandardScaler())
    
    # generate some simple data
    X, y, w = make_regression(n_samples=100,
                              n_features=3, # x variables generated and returned 
                              n_informative=3, # x variables included in the actual model of y
                              effective_rank=3, # make less than n_informative for multicollinearity
                              coef=True,
                              noise=0.1,
                              random_state=0,
                              bias=10)
    
    std_lin_reg.fit(X,y)
    print('custom transformer on y and no intercept r2_score: ',std_lin_reg.score(X,y))
    
    std_lin_reg_easy.fit(X,y)
    print('standard scaler on y and no intercept r2_score: ',std_lin_reg_easy.score(X,y))
    
    no_int_pipe.fit(X,y)
    print('\nonly standard scalar and no intercept r2_score: ',no_int_pipe.score(X,y))
    

    which returns

    custom transformer on y and no intercept r2_score:  0.9999343800041816
    
    standard scaler on y and no intercept r2_score:  0.9999343800041816
    
    only standard scalar and no intercept r2_score:  0.3319175799267782
    

提交回复
热议问题