PCA with sklearn. Unable to figure out feature selection with PCA

问题

I have been trying to do some dimensionality reduction using PCA. I currently have an image of size (100, 100) and I am using a filterbank of 140 Gabor filters where each filter gives me a response which is again an image of (100, 100). Now, I wanted to do feature selection where I only wanted to select non-redundant features and I read that PCA might be a good way to do.

So I proceeded to create a data matrix which has 10000 rows and 140 columns. So, each row contains the various responses of the Gabor filters for that filterbank. Now, as I understand it I can do a decomposition of this matrix using PCA as

from sklearn.decomposition import PCA

pca = pca(n_components = 3)
pca.fit(Q) # Q is my 10000 X 140 matrix

However, now I am confused as to how I can figure out which of these 140 feature vectors to keep from here. I am guessing it should give me 3 of these 140 vectors (corresponding to the Gabor filters which contain the most information about the image) but I have no idea how to proceed from here.

回答1:

PCA will give you a linear combination of features, not a selection of features. It will give you the linear combination that is the best for reconstruction in the L2 sense, aka the one that captures the most variance.

What is you goal? If you do this on one image, any kind of selection will give you features that will discriminate best some parts of an image against other parts of the same image.

Also: Garbor Filters are a sparse basis for natural images. I would not expect anything interesting to happen unless you have very specific images.

来源：https://stackoverflow.com/questions/26757412/pca-with-sklearn-unable-to-figure-out-feature-selection-with-pca

标签

python

image-processing

scikit-learn

pca

dimensionality-reduction