KDE fails with two points?

假如想象 提交于 2019-12-10 19:09:18

问题


The following trivial example returns a singular matrix. Why? Any ways to overcome it?

In: from scipy.stats import gaussian_kde
Out:

In:  points
Out: (array([63, 84]), array([46, 42]))

In:  gaussian_kde(points)
Out: (array([63, 84]), array([46, 42]))

LinAlgError: singular matrix

回答1:


Looking at the backtrace, you can see it fails when inverting the covariance matrix. This is due to exact multicollinearity of your data. From the page, you have multicollinearity in your data if two variables are collinear, i.e. if

the correlation between two independent variables is equal to 1 or -1

In this case, the two variables have only two samples, and they are always collinear (trivially, there exists always one line passing two distinct points). We can check that:

np.corrcoef(array([63,84]),array([46,42]))
[[ 1. -1.]
 [-1.  1.]]

To not be necessarily collinear, two variables must have at least n=3 samples. To add to this constraint, you have the limitation pointed out by ali_m, that the number of samples n should be greater or equal to the number of variables p. Putting the two together,

n>=max(3,p)

in this case p=2 and n>=3 is the right constraint.




回答2:


The error occurs when gaussian_kde() tries to take the inverse of the covariance matrix of your input data. In order for the covariance matrix to be nonsingular, the number of (non-identical) points in your input must be >= to the number of variables. Try adding a third point and you should see that it works.

This answer on Crossvalidated has a proper explanation for why this is the case.



来源:https://stackoverflow.com/questions/19261858/kde-fails-with-two-points

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!