dimensionality-reduction

PCA Dimensionality Reduction

我是研究僧i 提交于 2019-12-04 09:39:48
问题 I am trying to perform PCA reducing 900 dimensions to 10. So far I have: covariancex = cov(labels); [V, d] = eigs(covariancex, 40); pcatrain = (trainingData - repmat(mean(traingData), 699, 1)) * V; pcatest = (test - repmat(mean(trainingData), 225, 1)) * V; Where labels are 1x699 labels for chars (1-26). trainingData is 699x900, 900-dimensional data for the images of 699 chars. test is 225x900, 225 900-dimensional chars. Basically I want to reduce this down to 225x10 i.e. 10 dimensions but am

PCA Dimensionality Reduction

和自甴很熟 提交于 2019-12-03 03:39:07
I am trying to perform PCA reducing 900 dimensions to 10. So far I have: covariancex = cov(labels); [V, d] = eigs(covariancex, 40); pcatrain = (trainingData - repmat(mean(traingData), 699, 1)) * V; pcatest = (test - repmat(mean(trainingData), 225, 1)) * V; Where labels are 1x699 labels for chars (1-26). trainingData is 699x900, 900-dimensional data for the images of 699 chars. test is 225x900, 225 900-dimensional chars. Basically I want to reduce this down to 225x10 i.e. 10 dimensions but am kind of stuck at this point. The covariance is supposed to implemented in your trainingData : X =

After reducing the dimensionality of a dataset, I am getting negative feature values

折月煮酒 提交于 2019-12-01 00:03:22
I used a Dimensionality Reduction method (discussion here: Random projection algorithm pseudo code ) on a large dataset. After reducing the dimension from 1000 to 50, I get my new dataset where each sample looks like: [ 1751. -360. -2069. ..., 2694. -3295. -1764.] Now I am a bit confused, because I don't know what negative feature values supposed to mean. Is it okay to have negative features like this? Because before the reduction, each sample was like this: 3, 18, 18, 18, 126 ... Is it normal or am I doing something wrong? I guess you implemented the algorithm from this paper . As the

LDA ignoring n_components?

£可爱£侵袭症+ 提交于 2019-11-30 09:24:10
When I am trying to work with LDA from Scikit-Learn, it keeps only giving me one component, even though I am asking for more: >>> from sklearn.lda import LDA >>> x = np.random.randn(5,5) >>> y = [True, False, True, False, True] >>> for i in range(1,6): ... lda = LDA(n_components=i) ... model = lda.fit(x,y) ... model.transform(x) Gives /Users/orthogonal/virtualenvs/osxml/lib/python2.7/site-packages/sklearn/lda.py:161: UserWarning: Variables are collinear warnings.warn("Variables are collinear") array([[-0.12635305], [-1.09293574], [ 1.83978459], [-0.37521856], [-0.24527725]]) array([[-0

Plot PCA loadings and loading in biplot in sklearn (like R's autoplot)

霸气de小男生 提交于 2019-11-30 05:14:34
I saw this tutorial in R w/ autoplot . They plotted the loadings and loading labels: autoplot(prcomp(df), data = iris, colour = 'Species', loadings = TRUE, loadings.colour = 'blue', loadings.label = TRUE, loadings.label.size = 3) https://cran.r-project.org/web/packages/ggfortify/vignettes/plot_pca.html I prefer Python 3 w/ matplotlib, scikit-learn, and pandas for my data analysis. However, I don't know how to add these on? How can you plot these vectors w/ matplotlib ? I've been reading Recovering features names of explained_variance_ratio_ in PCA with sklearn but haven't figured it out yet

How to use eigenvectors obtained through PCA to reproject my data?

只谈情不闲聊 提交于 2019-11-29 07:42:25
I am using PCA on 100 images. My training data is 442368x100 double matrix. 442368 are features and 100 is number of images. Here is my code for finding the eigenvector. [ rows, cols] = size(training); maxVec=rows; maxVec=min(maxVec,rows); train_mean=mean(training,2); A=training-train_mean*ones(1,cols); A=A'*A; [evec,eval]=eig(A); [eval ind] = sort(-1*diag(eval)); evec= evec(:, ind(1:100)); Now evec is an eigenvector matrix of order of 100x100 double and now I have got 100 eigenvectors sorted. Questions: Now, if I want to transform my testing data using above calculated eigenvectors then how

How to use eigenvectors obtained through PCA to reproject my data?

落花浮王杯 提交于 2019-11-28 01:25:04
问题 I am using PCA on 100 images. My training data is 442368x100 double matrix. 442368 are features and 100 is number of images. Here is my code for finding the eigenvector. [ rows, cols] = size(training); maxVec=rows; maxVec=min(maxVec,rows); train_mean=mean(training,2); A=training-train_mean*ones(1,cols); A=A'*A; [evec,eval]=eig(A); [eval ind] = sort(-1*diag(eval)); evec= evec(:, ind(1:100)); Now evec is an eigenvector matrix of order of 100x100 double and now I have got 100 eigenvectors sorted

How to efficiently find k-nearest neighbours in high-dimensional data?

梦想与她 提交于 2019-11-27 05:43:13
问题 So I have about 16,000 75-dimensional data points, and for each point I want to find its k nearest neighbours (using euclidean distance, currently k=2 if this makes it easiser) My first thought was to use a kd-tree for this, but as it turns out they become rather inefficient as the number of dimension grows. In my sample implementation, its only slightly faster than exhaustive search. My next idea would be using PCA (Principal Component Analysis) to reduce the number of dimensions, but I was