问题
It is currently not possible to use scipy.stats.gaussian_kde
to estimate the density of a random variable based on weighted samples. What methods are available to estimate densities of continuous random variables based on weighted samples?
回答1:
Neither sklearn.neighbors.KernelDensity nor statsmodels.nonparametric seem to support weighted samples. I modified scipy.stats.gaussian_kde
to allow for heterogeneous sampling weights and thought the results might be useful for others. An example is shown below.
An ipython
notebook can be found here: http://nbviewer.ipython.org/gist/tillahoffmann/f844bce2ec264c1c8cb5
Implementation details
The weighted arithmetic mean is
The unbiased data covariance matrix is then given by
The bandwidth can be chosen by scott
or silverman
rules as in scipy
. However, the number of samples used to calculate the bandwidth is Kish's approximation for the effective sample size.
回答2:
For univariate distributions you can use KDEUnivariate
from statsmodels. It is not well documented, but the fit
methods accepts a weights
argument. Then you cannot use FFT. Here is an example:
import matplotlib.pyplot as plt
from statsmodels.nonparametric.kde import KDEUnivariate
kde1= KDEUnivariate(np.array([10.,10.,10.,5.]))
kde1.fit(bw=0.5)
plt.plot(kde1.support, [kde1.evaluate(xi) for xi in kde1.support],'x-')
kde1= KDEUnivariate(np.array([10.,5.]))
kde1.fit(weights=np.array([3.,1.]),
bw=0.5,
fft=False)
plt.plot(kde1.support, [kde1.evaluate(xi) for xi in kde1.support], 'o-')
which produces this figure:
回答3:
Check out the packages PyQT-Fit and statistics for Python. They seem to have kernel density estimation with weighted observations.
来源:https://stackoverflow.com/questions/27623919/weighted-gaussian-kernel-density-estimation-in-python