Right order of doing feature selection, PCA and normalization?

前端未结

关注

 3  595

野性不改 2021-02-05 16:00

I know that feature selection helps me remove features that may have low contribution. I know that PCA helps reduce possibly correlated features into one, reducing the dimension

3条回答

野性不改 (楼主)

2021-02-05 16:30
If I were doing a classifier of some sort I would personally use this order
1. Normalization
2. PCA
3. Feature Selection
Normalization: You would do normalization first to get data into reasonable bounds. If you have data (x,y) and the range of x is from -1000 to +1000 and y is from -1 to +1 You can see any distance metric would automatically say a change in y is less significant than a change in X. we don't know that is the case yet. So we want to normalize our data.

PCA: Uses the eigenvalue decomposition of data to find an orthogonal basis set that describes the variance in data points. If you have 4 characteristics, PCA can show you that only 2 characteristics really differentiate data points which brings us to the last step

Feature Selection: once you have a coordinate space that better describes your data you can select which features are salient.Typically you'd use the largest eigenvalues(EVs) and their corresponding eigenvectors from PCA for your representation. Since larger EVs mean there is more variance in that data direction, you can get more granularity in isolating features. This is a good method to reduce number of dimensions of your problem.

of course this could change from problem to problem, but that is simply a generic guide.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...