问题
I need generate sample from existing data using kernel density estimates in R. In my data missing negative values (and can not be), but in generate sample negative values present.
library(ks)
set.seed(1)
par(mfrow=c(2,1))
x<-rlnorm(100)
hist(x, col="red", freq=F)
y <- rkde(fhat=kde(x=x, h=hpi(x)), n=100)
hist(y, col="green", freq=F)
How to limit the range of the KDE and generated sample?
回答1:
rkde
pas a positive
argument:
y <- rkde(
fhat = kde(x=x, h=hpi(x)),
n = 100,
positive = TRUE
)
An alternative would be to transform the data (e.g., with a logarithm) before the estimation, to make it unconstrained, and transform it back after the random number generation.
x2 <- log(x)
y2 <- rkde(fhat=kde(x=x2, h=hpi(x2)), n=100)
y <- exp(y2)
hist(y, col="green", freq=F)
回答2:
If you can accept a density estimate that is not a KDE then look at the logspline package. This is a different way to estimate density estimates and there are arguments to set lower (and/or upper) bounds so that the resulting estimate will not go beyond the bound and makes sense near the bound.
Here is a basic example:
set.seed(1)
x<-rlnorm(100)
hist(x, prob=TRUE)
lines(density(x), col='red')
library(ks)
tmp <- kde(x, hpi(x))
lines(tmp$eval.points, tmp$estimate, col='green')
library(logspline)
lsfit <- logspline(x, lbound=0)
curve( dlogspline(x,lsfit), add=TRUE, col='blue' )
curve( dlnorm, add=TRUE, col='orange' )
You can generate new data points from the fitted density using the rlogspline
function and there are also plogspline
and qlogspline
functions.
来源:https://stackoverflow.com/questions/16102048/generation-sample-using-kernel-density-estimates-in-r