I\'m working through some examples of Linear Regression under different scenarios, comparing the results from using Normalizer
and StandardScaler
, and
The last question (3) about the incoherent results with fit_intercept=0 and standardized data has not been answered fully.
The OP is likely expecting StandardScaler to standardize X and y, which would make the intercept necessarily 0 (proof 1/3 of the way down).
However StandardScaler ignores y. see the api.
TransformedTargetRegressor offers a solution. This approach is also useful for non-linear transformations of the dependent variable such as the log transformation of y (but consider this).
Here's an example that resolves OP's issue #3:
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import make_pipeline
from sklearn.datasets import make_regression
from sklearn.compose import TransformedTargetRegressor
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
import numpy as np
# define a custom transformer
class stdY(BaseEstimator,TransformerMixin):
def __init__(self):
pass
def fit(self,Y):
self.std_err_=np.std(Y)
self.mean_=np.mean(Y)
return self
def transform(self,Y):
return (Y-self.mean_)/self.std_err_
def inverse_transform(self,Y):
return Y*self.std_err_+self.mean_
# standardize X and no intercept pipeline
no_int_pipe=make_pipeline(StandardScaler(),LinearRegression(fit_intercept=0)) # only standardizing X, so not expecting a great fit by itself.
# standardize y pipeline
std_lin_reg=TransformedTargetRegressor(regressor=no_int_pipe, transformer=stdY()) # transforms y, estimates the model, then reverses the transformation for evaluating loss.
#after returning to re-read my answer, there's an even easier solution, use StandardScaler as the transfromer:
std_lin_reg_easy=TransformedTargetRegressor(regressor=no_int_pipe, transformer=StandardScaler())
# generate some simple data
X, y, w = make_regression(n_samples=100,
n_features=3, # x variables generated and returned
n_informative=3, # x variables included in the actual model of y
effective_rank=3, # make less than n_informative for multicollinearity
coef=True,
noise=0.1,
random_state=0,
bias=10)
std_lin_reg.fit(X,y)
print('custom transformer on y and no intercept r2_score: ',std_lin_reg.score(X,y))
std_lin_reg_easy.fit(X,y)
print('standard scaler on y and no intercept r2_score: ',std_lin_reg_easy.score(X,y))
no_int_pipe.fit(X,y)
print('\nonly standard scalar and no intercept r2_score: ',no_int_pipe.score(X,y))
which returns
custom transformer on y and no intercept r2_score: 0.9999343800041816
standard scaler on y and no intercept r2_score: 0.9999343800041816
only standard scalar and no intercept r2_score: 0.3319175799267782