Can anyone explain me StandardScaler?

前端未结

关注

 9  470

一整个雨季

I am unable to understand the page of the StandardScaler in the documentation of sklearn.

Can anyone explain this to me in simple terms?

相关标签:

9条回答

清酒与你

2020-12-04 06:33

StandardScaler performs the task of Standardization. Usually a dataset contains variables that are different in scale. For e.g. an Employee dataset will contain AGE column with values on scale 20-70 and SALARY column with values on scale 10000-80000.
As these two columns are different in scale, they are Standardized to have common scale while building machine learning model.

0 讨论(0)
发布评论:

提交评论
- 加载中...
庸人自扰

2020-12-04 06:39
How to calculate it:

You can read more here:
- http://sebastianraschka.com/Articles/2014_about_feature_scaling.html#standardization-and-min-max-scaling
0 讨论(0)
发布评论:

提交评论
- 加载中...
清歌不尽

2020-12-04 06:40

The idea behind StandardScaler is that it will transform your data such that its distribution will have a mean value 0 and standard deviation of 1.
In case of multivariate data, this is done feature-wise (in other words independently for each column of the data).
Given the distribution of the data, each value in the dataset will have the mean value subtracted, and then divided by the standard deviation of the whole dataset (or feature in the multivariate case).

0 讨论(0)
发布评论:

提交评论
- 加载中...
深忆病人

2020-12-04 06:45

This is useful when you want to compare data that correspond to different units. In that case, you want to remove the units. To do that in a consistent way of all the data, you transform the data in a way that the variance is unitary and that the mean of the series is 0.

0 讨论(0)
发布评论:

提交评论
- 加载中...
爱一瞬间的悲伤

2020-12-04 06:47
We apply StandardScalar() on a row basis.

So, for each row in a column (I am assuming that you are working with a Pandas DataFrame):

x_new = (x_original - mean_of_distribution) / std_of_distribution

Few points -
1. It is called Standard Scalar as we are dividing it by the standard deviation of the distribution (distr. of the feature). Similarly, you can guess for MinMaxScalar().
2. The original distribution remains the same after applying StandardScalar(). It is a common misconception that the distribution gets changed to a Normal Distribution. We are just squashing the range into [0, 1].
0 讨论(0)
发布评论:

提交评论
- 加载中...
萌比男神i

2020-12-04 06:49

After applying StandardScaler(), each column in X will have mean of 0 and standard deviation of 1.

Formulas are listed by others on this page.

Rationale: some algorithms require data to look like this (see sklearn docs).

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页