Weighted Gaussian kernel density estimation in `python`

前端 未结 3 1959
野的像风
野的像风 2020-12-14 21:32

Update: Weighted samples are now supported by scipy.stats.gaussian_kde. See here and here for details.

It is currently not possible to u

相关标签:
3条回答
  • 2020-12-14 21:51

    For univariate distributions you can use KDEUnivariate from statsmodels. It is not well documented, but the fit methods accepts a weights argument. Then you cannot use FFT. Here is an example:

    import matplotlib.pyplot as plt
    from statsmodels.nonparametric.kde import KDEUnivariate
    
    kde1= KDEUnivariate(np.array([10.,10.,10.,5.]))
    kde1.fit(bw=0.5)
    plt.plot(kde1.support, [kde1.evaluate(xi) for xi in kde1.support],'x-')
    
    kde1= KDEUnivariate(np.array([10.,5.]))
    kde1.fit(weights=np.array([3.,1.]), 
             bw=0.5,
             fft=False)
    plt.plot(kde1.support, [kde1.evaluate(xi) for xi in kde1.support], 'o-')
    

    which produces this figure:

    0 讨论(0)
  • 2020-12-14 21:58

    Neither sklearn.neighbors.KernelDensity nor statsmodels.nonparametric seem to support weighted samples. I modified scipy.stats.gaussian_kde to allow for heterogeneous sampling weights and thought the results might be useful for others. An example is shown below.

    example

    An ipython notebook can be found here: http://nbviewer.ipython.org/gist/tillahoffmann/f844bce2ec264c1c8cb5

    Implementation details

    The weighted arithmetic mean is

    weighted arithmetic mean

    The unbiased data covariance matrix is then given by unbiased covariance matrix

    The bandwidth can be chosen by scott or silverman rules as in scipy. However, the number of samples used to calculate the bandwidth is Kish's approximation for the effective sample size.

    0 讨论(0)
  • 2020-12-14 22:05

    Check out the packages PyQT-Fit and statistics for Python. They seem to have kernel density estimation with weighted observations.

    0 讨论(0)
提交回复
热议问题