Gaussian Mixture Model in MATLAB - Calculation of the Empirical Variance Covariance Matrix

问题

I am having issues in reconciling some basic theoretical results on Gaussian mixtures and the output of the commands gmdistribution, random in Matlab.

Consider a mixture of two independent 3-variate normal distributions with weights 1/2,1/2.

The first distribution A is characterised by mean and variance-covariance matrix equal to

muA=[-1.4 3.2 -1.9]; %mean vector
rhoA=-0.5; %correlation among components in A
sigmaA=[1 rhoA rhoA; rhoA 1 rhoA; rhoA rhoA 1]; %variance-covariance matrix of A

The second distribution B is characterised by mean and variance-covariance matrix equal to

muB=muB=[1.2 -1.6 1.5]; %mean vector
rhoB=0.3; %correlation among components in B
sigmaB=[1 rhoB rhoB; rhoB 1 rhoB; rhoB rhoB 1]; %variance-covariance matrix of B

Let epsilon be the 3-variate random vector distributed as the mixture. My calculations suggest that the expected value of epsilon should be

Mtheory=1/2*(muA+muB);

and the variance-covariance matrix should be

Vtheory=1/4*[2 rhoA+rhoB rhoA+rhoB; rhoA+rhoB 2 rhoA+rhoB; rhoA+rhoB rhoA+rhoB 2];

Let's now try to see whether Mtheory and Vtheory coincide with the empirical moments that we get by drawing many random numbers from the mixture.

clear
rng default 

n=10^6; %number of draws 

w = ones(1,2)/2; %weights 

rhoA=-0.5; %correlation among components of A
rhoB=0.3; %correlation among components of B

muA=[-1.4 3.2 -1.9]; %mean vector of A
muB=[1.2 -1.6 1.5]; %mean vector of B
mu = [muA;muB];    
%Variance-covariance matrix for mixing
sigmaA=[1 rhoA rhoA; rhoA 1 rhoA; rhoA rhoA 1]; %variance-covariance matrix of A
sigmaB=[1 rhoB rhoB; rhoB 1 rhoB; rhoB rhoB 1]; %variance-covariance matrix of B 
sigma = cat(3,sigmaA,sigmaB);

obj = gmdistribution(mu, sigma,w);

%Draws
epsilon = random(obj, n); 

M=mean(epsilon);
V=cov(epsilon);
Mtheory=1/2*(muA+muB);
Vtheory=1/4*[2 rhoA+rhoB rhoA+rhoB; rhoA+rhoB 2 rhoA+rhoB; rhoA+rhoB rhoA+rhoB 2];

Question: M and Mtheory almost coincide. V and Vtheory are completely different. What am I doing wrong? I should be doing something very silly but I don't see where.

回答1:

When you calculate the Covariance pay attention that your data isn't centered.
Moreover, your 0.25 factor is wrong.
This is not a scaling of the variable but a selection.
The calculation should be done using the Law of Total Variance / Law of Total Covariance.
Where the "The Given Event" is the mixture index.

An example of the calculation is given by Calculation of the Covariance of Gaussian Mixtures.

来源：https://stackoverflow.com/questions/50875160/gaussian-mixture-model-in-matlab-calculation-of-the-empirical-variance-covaria

标签

matlab

statistics

normal-distribution

mixture-model