R - simulate data for probability density distribution obtained from kernel density estimate

馋奶兔 提交于 2019-12-04 14:53:31

With your underlying discrete data, create a kernel density estimate on as fine a grid as you wish (i.e., as "close to continuous" as needed for your application (within the limits of machine precision and computing time, of course)). Then sample from that kernel density, using the density values to ensure that more probable values of your distribution are more likely to be sampled. For example:

Fake data, just to have something to work with in this example:

set.seed(4396)
dat = round(rnorm(1000,100,10))

Create kernel density estimate. Increase n if you want the density estimated on a finer grid of points:

dens = density(dat, n=2^14)

In this case, the density is estimated on a grid of 2^14 points, with distance mean(diff(dens$x))=0.0045 between each point.

Now, sample from the kernel density estimate: We sample the x-values of the density estimate, and set prob equal to the y-values (densities) of the density estimate, so that more probable x-values will be more likely to be sampled:

kern.samp = sample(dens$x, 250000, replace=TRUE, prob=dens$y)

Compare dens (the density estimate of our original data) (black line), with the density of kern.samp (red):

plot(dens, lwd=2)
lines(density(kern.samp), col="red",lwd=2)

With the method above, you can create a finer and finer grid for the density estimate, but you'll still be limited to density values at grid points used for the density estimate (i.e., the values of dens$x). However, if you really need to be able to get the density for any data value, you can create an approximation function. In this case, you would still create the density estimate--at whatever bandwidth and grid size necessary to capture the structure of the data--and then create a function that interpolates the density between the grid points. For example:

dens = density(dat, n=2^14)

dens.func = approxfun(dens)

x = c(72.4588, 86.94, 101.1058301)

dens.func(x)
[1] 0.001689885 0.017292405 0.040875436

You can use this to obtain the density distribution at any x value (rather than just at the grid points used by the density function), and then use the output of dens.func as the prob argument to sample.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!