How to select top 100 features(a subset) which are most relevant after pca?

前端 未结 3 845
盖世英雄少女心
盖世英雄少女心 2021-01-06 19:49

I performed PCA on a 63*2308 matrix and obtained a score and a co-efficient matrix. The score matrix is 63*2308 and the co-efficient matrix is 2308*2308 in dimensions.

相关标签:
3条回答
  • 2021-01-06 20:07

    PCA should give you both a set of eigenvectors (your co-efficient matrix) and a vector of eigenvalues (1*2308) often referred to as lambda). You might been to use a different PCA function in matlab to get them.

    The eigenvalues indicate how much of your data each eigenvector explains. A simple method for selecting features would be to select the 100 features with the highest eigen values. This gives you a set of feature which explain most of the variance in the data.

    If you need to justify your approach for a write up you can actually calculate the amount of variance explained per eigenvector and cut of at, for example, 95% variance explained.

    Bear in mind that selecting based solely on eigenvalue, might not correspond to the set of features most important to your regression, so if you don't get the performance you expect you might want to try a different feature selection method such as recursive feature selection. I would suggest using google scholar to find a couple of papers doing something similar and see what methods they use.


    A quick matlab example of taking the top 100 principle components using PCA.

    [eigenvectors, projected_data, eigenvalues] = princomp(X);
    [foo, feature_idx] = sort(eigenvalues, 'descend');
    selected_projected_data = projected(:, feature_idx(1:100));
    
    0 讨论(0)
  • 2021-01-06 20:07

    Be careful!

    With just 63 observations and 2308 variables, your PCA result will be meaningless because the data is underspecified. You should have at least (rule of thumb) dimensions*3 observations.

    With 63 observations, you can at most define a 62 dimensional hyperspace!

    0 讨论(0)
  • 2021-01-06 20:18

    Have you tried with

    B = sort(your_matrix,2,'descend');
    C = B(:,1:100);
    
    0 讨论(0)
提交回复
热议问题