I\'m working through some examples of Linear Regression under different scenarios, comparing the results from using Normalizer
and StandardScaler
, and
Sklearn
de-normalize the co-efficients behind the scenes after calculating the co-effs from normalized input data. Reference This de-normalization has been done because for test data, we can directly apply the co-effs. and get the prediction without normalizing the test data.
Hence, setting normalize=True
do have impact on co-efficients but they dont affect the best fit line anyway.
Normalizer
does the normalization with respect to each sample (meaning row-wise). You see the reference code here.From documentation:
Normalize samples individually to unit norm.
whereas normalize=True
does the normalization with respect to each column/ feature. Reference
Example to understand the impact of normalization at different dimension of the data. Let us take two dimensions x1 & x2 and y be the target variable. Target variable value is color coded in the figure.
import matplotlib.pyplot as plt
from sklearn.preprocessing import Normalizer,StandardScaler
from sklearn.preprocessing.data import normalize
n=50
x1 = np.random.normal(0, 2, size=n)
x2 = np.random.normal(0, 2, size=n)
noise = np.random.normal(0, 1, size=n)
y = 5 + 0.5*x1 + 2.5*x2 + noise
fig,ax=plt.subplots(1,4,figsize=(20,6))
ax[0].scatter(x1,x2,c=y)
ax[0].set_title('raw_data',size=15)
X = np.column_stack((x1,x2))
column_normalized=normalize(X, axis=0)
ax[1].scatter(column_normalized[:,0],column_normalized[:,1],c=y)
ax[1].set_title('column_normalized data',size=15)
row_normalized=Normalizer().fit_transform(X)
ax[2].scatter(row_normalized[:,0],row_normalized[:,1],c=y)
ax[2].set_title('row_normalized data',size=15)
standardized_data=StandardScaler().fit_transform(X)
ax[3].scatter(standardized_data[:,0],standardized_data[:,1],c=y)
ax[3].set_title('standardized data',size=15)
plt.subplots_adjust(left=0.3, bottom=None, right=0.9, top=None, wspace=0.3, hspace=None)
plt.show()
You could see that best fit line for data in fig 1,2 and 4 would be the same; signifies that the R2_-score will not change due to column/feature normalization or standardizing data. Just that, it ends up with different co-effs. values.
Note: best fit line for fig3
would be different.
The prediction with intercept as zero would be expected to perform bad for problems where target variables are not scaled (mean =0). You can see a difference of 22.532 in every row, which signifies the impact of the output.