Comparing Results from StandardScaler vs Normalizer in Linear Regression

前端未结

关注

 3  2216

无人及你 2021-02-18 21:40

I\'m working through some examples of Linear Regression under different scenarios, comparing the results from using Normalizer and StandardScaler, and

3条回答

北恋 (楼主)

2021-02-18 22:07

The last question (3) about the incoherent results with fit_intercept=0 and standardized data has not been answered fully.

The OP is likely expecting StandardScaler to standardize X and y, which would make the intercept necessarily 0 (proof 1/3 of the way down).

However StandardScaler ignores y. see the api.

TransformedTargetRegressor offers a solution. This approach is also useful for non-linear transformations of the dependent variable such as the log transformation of y (but consider this).

Here's an example that resolves OP's issue #3:

from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import make_pipeline
from sklearn.datasets import make_regression
from sklearn.compose import TransformedTargetRegressor
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
import numpy as np

# define a custom transformer
class stdY(BaseEstimator,TransformerMixin):
    def __init__(self):
        pass
    def fit(self,Y):
        self.std_err_=np.std(Y)
        self.mean_=np.mean(Y)
        return self
    def transform(self,Y):
        return (Y-self.mean_)/self.std_err_
    def inverse_transform(self,Y):
        return Y*self.std_err_+self.mean_

# standardize X and no intercept pipeline
no_int_pipe=make_pipeline(StandardScaler(),LinearRegression(fit_intercept=0)) # only standardizing X, so not expecting a great fit by itself.

# standardize y pipeline
std_lin_reg=TransformedTargetRegressor(regressor=no_int_pipe, transformer=stdY()) # transforms y, estimates the model, then reverses the transformation for evaluating loss.

#after returning to re-read my answer, there's an even easier solution, use StandardScaler as the transfromer:
std_lin_reg_easy=TransformedTargetRegressor(regressor=no_int_pipe, transformer=StandardScaler())

# generate some simple data
X, y, w = make_regression(n_samples=100,
                          n_features=3, # x variables generated and returned 
                          n_informative=3, # x variables included in the actual model of y
                          effective_rank=3, # make less than n_informative for multicollinearity
                          coef=True,
                          noise=0.1,
                          random_state=0,
                          bias=10)

std_lin_reg.fit(X,y)
print('custom transformer on y and no intercept r2_score: ',std_lin_reg.score(X,y))

std_lin_reg_easy.fit(X,y)
print('standard scaler on y and no intercept r2_score: ',std_lin_reg_easy.score(X,y))

no_int_pipe.fit(X,y)
print('\nonly standard scalar and no intercept r2_score: ',no_int_pipe.score(X,y))

which returns

custom transformer on y and no intercept r2_score:  0.9999343800041816

standard scaler on y and no intercept r2_score:  0.9999343800041816

only standard scalar and no intercept r2_score:  0.3319175799267782

0 讨论(0)

查看其它3个回答