问题
I am not sure if this is a programming or statistics question, but I am %99 sure that there should be a numerical problem. So maybe a programmatic solution can be proposed.
I am using MATLAB's mvnpdf function to calculate multi-variate Gaussian PDF of some observations. Frequently I get "SIGMA must be symmetric and positive definite" errors.
However, I am obtaining the covarince matrix from the data, so the data should be legal. A code to regenerate the problem is:
err_cnt = 0;
for i = 1:1000
try
a = rand(3);
c = cov(a);
m = mean(a);
mvnpdf(a, m, c);
catch me
err_cnt = err_cnt + 1;
end
end
I get ~500-600 errors each time I run.
P.S. I do not generate random data in my case, just generated here to demonstrate.
回答1:
This happens if the diagonal values of the covariance matrix are (very close to) zero. A simple fix is add a very small constant number to c
.
err_cnt = 0;
for i = 1:1000
try
a = rand(3);
c = cov(a) + .0001 * eye(3);
m = mean(a);
mvnpdf(a, m, c);
catch me
err_cnt = err_cnt + 1;
end
end
Results in 0 errors.
回答2:
This is a linear algebra problem rather than a programming one. Recall the formula for the PDF of a k-dimensional multivariate normal distribution:
When your matrix is not strictly positive definite (i.e., it is singular), the determinant in the denominator is zero and the inverse in the exponent is not defined, which is why you're getting the errors.
However, it is a common misconception that covariance matrices must be positive definite. This is not true — covariance matrices only need to be positive semidefinite. It is perfectly possible for your data to have a covariance matrix that is singular. Also, since what you're forming is the sample covariance matrix of your observed data, you can have singularities arising from not having sufficient observations.
回答3:
When your data lives in a subspace (singular covariance matrix), the probability density is singular in the full space. Loosely speaking, this means that your density is infinite at each point which is not very useful. Therefore, if this is the case, and it is NOT numerical, then you may want to consider the probability density in the subspace for which the data spans. Here the density is well defined. Adding a diagonal value as @Junuxx results in very different values in this case.
来源:https://stackoverflow.com/questions/11269715/matlabs-sigma-must-be-symmetric-and-positive-definite-error-sometimes-not-mak