Should I normalize my features before throwing them into RNN?

前端 未结 3 1172
执念已碎
执念已碎 2021-02-06 09:57

I am playing some demos about recurrent neural network.

I noticed that the scale of my data in each column differs a lot. So I am considering to do some preprocess work

相关标签:
3条回答
  • 2021-02-06 10:25

    Definetly yes. Most of neural networks work best with data beetwen 0-1 or -1 to 1(depends on output function). Also when some inputs are higher then others network will "think" they are more important. This can make learning very long. Network must first lower weights in this inputs.

    0 讨论(0)
  • 2021-02-06 10:30

    I found this https://arxiv.org/abs/1510.01378 If you normalize it may improve convergence so you will get lower training times.

    0 讨论(0)
  • 2021-02-06 10:34

    It will be beneficial to normalize your training data. Having different features with widely different scales fed to your model will cause the network to weight the features not equally. This can cause a falsely prioritisation of some features over the others in the representation.

    Despite that the whole discussion on data preprocessing is controversial either on when exactly it is necessary and how to correctly normalize the data for each given model and application domain there is a general consensus in Machine Learning that running a Mean subtraction as well as a general Normalization preprocessing step is helpful.

    In the case of Mean subtraction, the mean of every individual feature is being subtracted from the data which can be interpreted as centering the data around the origin from a geometric point of view. This is true for every dimensionality.

    Normalizing the data after the Mean subtraction step results in a normalization of the data dimensionality to approximately the same scale. Note that the different features will loose any prioritization over each other after this step as mentioned above. If you have good reasons to think that the different scales in your features bear important information that the network may need to truly understand the underlying patterns in your dataset, then a normalization will be harmful. A standard approach would be to scale the inputs to have mean of 0 and a variance of 1.

    Further preprocessing operations may be helpful in specific cases such as performing PCA or Whitening on your data. Look into the awesome notes of CS231n (Setting up the data and the model) for further reference on these topics as well as for a more detailed explenation of the topics above.

    0 讨论(0)
提交回复
热议问题