问题
I am using CCA for my work and want to understand something.
This is my MATLAB code. I have only taken 100 samples to better understand the concepts of CCA.
clc;clear all;close all;
load carbig;
data = [Displacement Horsepower Weight Acceleration MPG];
data(isnan(data))=0;
X = data(1:100,1:3);
Y = data(1:100,4:5);
[wx,wy,~,U,V] = CCA(X,Y);
clear Acceleration Cylinders Displacement Horsepower MPG Mfg Model Model_Year Origin Weight when org
subplot(1,2,1),plot(U(:,1),V(:,1),'.');
subplot(1,2,2),plot(U(:,2),V(:,2),'.');
My plots are coming like this:
This points out that in the 1st figure (left), the transformed variables are highly correlated with little scatter around the central axis. While in the 2nd figure(right), the scatter around the central axis is much more.
As I understand from here that CCA maximizes the correlation between the data in the transformed space. So I tried to design a matching score which should return a minimum value if the vectors are maximally correlated. I tried to match each vector of U(i,:)
with that of V(j,:)
with i,j
going from 1 to 100.
%% Finding the difference between the projected vectors
for i=1:size(U,1)
cost = repmat(U(i,:),size(U,1),1)- V;
for j=1:size(U,1)
c(i,j) = norm(cost(j,:),size(U,2));
end
[~,idx(i)] = min(c(i,:));
end
Ideally idx should be like this :
idx = 1 2 3 4 5 6 7 8 9 10 ....
as they are maximally correlated. However my output comes something like this :
idx = 80 5 3 1 4 7 17 17 17 10 68 78 78 75 9 10 5 1 6 17 .....
I dont understand why this happens.
- Am I wrong somewhere ? Isnt the vectors supposed to be maximally correlated in the transformed CCA subspace?
- If my above assumption is wrong, please point me out in the correct direction.
Thanks in advance.
回答1:
First, Let me transpose your code in R2014b:
load carbig;
data = [Displacement Horsepower Weight Acceleration MPG];
% Truncate the data, to follow-up with your sample code
data = data(1:100,:);
nans = sum(isnan(data),2) > 0;
[wx, wy, r, U, V,] = canoncorr(X(~nans,1:3),X(~nans,4:5));
OK, now the trick is that the vectors which are maximally correlated in the CCA subspace are the column vectors U(:,1)
with V(:,1)
and U(:,2)
with V(:,2)
, and not the row vectors U(i,:)
, as you are trying to compute. In the CCA subspace, vectors should be N-dimensional (here N=100
), and not simple 2D vectors. That's the reason why visualization of CCA results is often quite complicated !
By the way, the correlations are given by the third output of canoncorr
, that you (intentionally ?) choosed to skip in your code. If you check its content, you'll see that the correlations (i.e. the vectors) are well-ordered:
r =
0.9484 0.5991
It is hard to explain CCA better than the link you already provided. If you want to go further, you should probably invest in a book, like this one or this one.
来源:https://stackoverflow.com/questions/28234644/making-sense-of-cca-matlab-implementation-2