Right order of doing feature selection, PCA and normalization?

前端 未结 3 595
野性不改
野性不改 2021-02-05 16:00

I know that feature selection helps me remove features that may have low contribution. I know that PCA helps reduce possibly correlated features into one, reducing the dimension

3条回答
  •  野性不改
    2021-02-05 16:30

    If I were doing a classifier of some sort I would personally use this order

    1. Normalization
    2. PCA
    3. Feature Selection

    Normalization: You would do normalization first to get data into reasonable bounds. If you have data (x,y) and the range of x is from -1000 to +1000 and y is from -1 to +1 You can see any distance metric would automatically say a change in y is less significant than a change in X. we don't know that is the case yet. So we want to normalize our data.

    PCA: Uses the eigenvalue decomposition of data to find an orthogonal basis set that describes the variance in data points. If you have 4 characteristics, PCA can show you that only 2 characteristics really differentiate data points which brings us to the last step

    Feature Selection: once you have a coordinate space that better describes your data you can select which features are salient.Typically you'd use the largest eigenvalues(EVs) and their corresponding eigenvectors from PCA for your representation. Since larger EVs mean there is more variance in that data direction, you can get more granularity in isolating features. This is a good method to reduce number of dimensions of your problem.

    of course this could change from problem to problem, but that is simply a generic guide.

提交回复
热议问题