Weighted Gaussian kernel density estimation in `python`

99封情书 提交于 2019-12-30 01:01:26

问题


It is currently not possible to use scipy.stats.gaussian_kde to estimate the density of a random variable based on weighted samples. What methods are available to estimate densities of continuous random variables based on weighted samples?


回答1:


Neither sklearn.neighbors.KernelDensity nor statsmodels.nonparametric seem to support weighted samples. I modified scipy.stats.gaussian_kde to allow for heterogeneous sampling weights and thought the results might be useful for others. An example is shown below.

An ipython notebook can be found here: http://nbviewer.ipython.org/gist/tillahoffmann/f844bce2ec264c1c8cb5

Implementation details

The weighted arithmetic mean is

The unbiased data covariance matrix is then given by

The bandwidth can be chosen by scott or silverman rules as in scipy. However, the number of samples used to calculate the bandwidth is Kish's approximation for the effective sample size.




回答2:


For univariate distributions you can use KDEUnivariate from statsmodels. It is not well documented, but the fit methods accepts a weights argument. Then you cannot use FFT. Here is an example:

import matplotlib.pyplot as plt
from statsmodels.nonparametric.kde import KDEUnivariate

kde1= KDEUnivariate(np.array([10.,10.,10.,5.]))
kde1.fit(bw=0.5)
plt.plot(kde1.support, [kde1.evaluate(xi) for xi in kde1.support],'x-')

kde1= KDEUnivariate(np.array([10.,5.]))
kde1.fit(weights=np.array([3.,1.]), 
         bw=0.5,
         fft=False)
plt.plot(kde1.support, [kde1.evaluate(xi) for xi in kde1.support], 'o-')

which produces this figure:




回答3:


Check out the packages PyQT-Fit and statistics for Python. They seem to have kernel density estimation with weighted observations.



来源:https://stackoverflow.com/questions/27623919/weighted-gaussian-kernel-density-estimation-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!