I am using CCA for my work and want to understand something.
This is my MATLAB code. I have only taken 100 samples to better understand the concepts of CCA.
clc;clear all;close all; load carbig; data = [Displacement Horsepower Weight Acceleration MPG]; data(isnan(data))=0; X = data(1:100,1:3); Y = data(1:100,4:5); [wx,wy,~,U,V] = CCA(X,Y); clear Acceleration Cylinders Displacement Horsepower MPG Mfg Model Model_Year Origin Weight when org subplot(1,2,1),plot(U(:,1),V(:,1),'.'); subplot(1,2,2),plot(U(:,2),V(:,2),'.');
My plots are coming like this:
This points out that in the 1st figure (left), the transformed variables are highly correlated with little scatter around the central axis. While in the 2nd figure(right), the scatter around the central axis is much more.
As I understand from here that CCA maximizes the correlation between the data in the transformed space. So I tried to design a matching score which should return a minimum value if the vectors are maximally correlated. I tried to match each vector of U(i,:)
with that of V(j,:)
with i,j
going from 1 to 100.
%% Finding the difference between the projected vectors for i=1:size(U,1) cost = repmat(U(i,:),size(U,1),1)- V; for j=1:size(U,1) c(i,j) = norm(cost(j,:),size(U,2)); end [~,idx(i)] = min(c(i,:)); end
Ideally idx should be like this :
idx = 1 2 3 4 5 6 7 8 9 10 ....
as they are maximally correlated. However my output comes something like this :
idx = 80 5 3 1 4 7 17 17 17 10 68 78 78 75 9 10 5 1 6 17 .....
I dont understand why this happens.
- Am I wrong somewhere ? Isnt the vectors supposed to be maximally correlated in the transformed CCA subspace?
- If my above assumption is wrong, please point me out in the correct direction.
Thanks in advance.