Update: Weighted samples are now supported by scipy.stats.gaussian_kde
. See here and here for details.
It is currently not possible to u
For univariate distributions you can use KDEUnivariate
from statsmodels. It is not well documented, but the fit
methods accepts a weights
argument. Then you cannot use FFT. Here is an example:
import matplotlib.pyplot as plt
from statsmodels.nonparametric.kde import KDEUnivariate
kde1= KDEUnivariate(np.array([10.,10.,10.,5.]))
kde1.fit(bw=0.5)
plt.plot(kde1.support, [kde1.evaluate(xi) for xi in kde1.support],'x-')
kde1= KDEUnivariate(np.array([10.,5.]))
kde1.fit(weights=np.array([3.,1.]),
bw=0.5,
fft=False)
plt.plot(kde1.support, [kde1.evaluate(xi) for xi in kde1.support], 'o-')
which produces this figure:
Neither sklearn.neighbors.KernelDensity nor statsmodels.nonparametric seem to support weighted samples. I modified scipy.stats.gaussian_kde
to allow for heterogeneous sampling weights and thought the results might be useful for others. An example is shown below.
An ipython
notebook can be found here: http://nbviewer.ipython.org/gist/tillahoffmann/f844bce2ec264c1c8cb5
The weighted arithmetic mean is
The unbiased data covariance matrix is then given by
The bandwidth can be chosen by scott
or silverman
rules as in scipy
. However, the number of samples used to calculate the bandwidth is Kish's approximation for the effective sample size.
Check out the packages PyQT-Fit and statistics for Python. They seem to have kernel density estimation with weighted observations.